Browser Automation with Selenium: Fingerprints, recognizability and traceability? - selenium-webdriver

I want to use selenium/webdriver to simulate a browser and scrape some website-content with it. Even if its not the fastest method, for me it has many advantages such as executing scripts etc.
For many websites it is forbidden to access them via an automated method, for example search engines like google or bing.
For one tool i need to scrape the estimated resultstat from google for several keywords. This will look like the following: simulate the browser that visits google.com and types in a keyword and scrapes the results, then after a little pause type in the next keyword, scrape the results and so on...
My question is: Is it possible for a website to recognize that I'm using selenium to simulate the browser instead of using the browser by hand? Especially the google case gives me some doubts. I know selenium is partly developed by google or at least by some guys working for google. So does leave selenium some fingerprints or isn't it possible to decide if I'm using the browser by myself or simulated by selenium, even for google?

No, nobody can actually see that you're using Selenium and not hand-operating the browser yourself with WebDriver. I'm not sure about the old Selenium RC, but it should be the same way. Here's how it works:
Selenium opens up a browser with a clean profile (or with a profile you selected)
Selenium is hooked up to the browser so it can steer it, control it. But the browser still does most of the work. Basically, Selenium replaces the user inputs to the browser, but not more.
You can easily verify this by reading the contents of the HTTP headers sent by your browser.
If you ever actually needed Selenium to be recognized by your server, you can use Browsermob-proxy and add a custom header to your requests.
All that said, there is one thing you must be aware of. While there's no way to detect Selenium directly, there can be some indirect clues picked up by the website you're visiting. Those usually include scanning for too many requests made in virtually no time - this might be an issue for you. Make sure your Selenium is behaving like a user.
EDIT 2016/04:
Apparanetly it is possible as https://stackoverflow.com/a/33403473/2930045 states that a company can do it. My guess - and it is nothing but a guess - is that they can run some JS that Selenium installs into the browser to operate.

Signs point to yes, sites are able to regonize that you are using Selenium.
Counter Example: www.stubhub.com detects and blocks my browser instance launched using Selenium while "normal" browsing done manually (not using the browser launched by the Selenium web driver) work with out issue.
See this stackoverflow question for additional details
Can a website detect when you are using selenium with chromedriver?

Related

How to use chromium engine inside google chrome to render my application

I wonder if it is possible to use chromium engine inside google chrome or the google chrome itself to render a web page inside my WPF application instead of using traditional WebView (because it's the IE engine and it's awful -_-) or implementing CEFSharp (because it uses about 200 MB of space only for the chromium engine)
so in this case I need the target pc to has installed google chrome or any other(firefox or ...)
Soooo ... is there any solution?
thanks in advance
EDIT
I want to create applications based on web UI, - because of being easy and powerful - I know some providing this feature, e.g. CEF Sharp WPF or electron js but they include a full chromium engine with the app. I don't want this.
I want to create my app as light as possible, and my idea is to use chromium engine of a modern browser, that almost everyone has one.
For example, imagine that the user has installed google chrome.
first I locate the installation folder
I use chromium.exe -render path/to/file.html(imaginary) command to render my application UI.
finally bind the UI events to my native code. (e.g. c#(wpf) or any language that you can create desktop apps with it)
One solution is creating web apps by installing a website with the browser. but with that you cannot for example create or read some files in user pc, or any similar operation.
I'm looking for the most light-weight solution...
There is a new Chromium based WebView2 control that you can use to embed modern web content in your WPF application.
Please refer to the docs for more information about the prerequisites and how to use it:
Getting started with WebView2 in WPF
Explanation
So, let's say that you want your UI to be rendered in a chromium environment(aka a browser)… right?
let's take a look at electron js:
it uses NodeJS as backend.
it uses an embedded browser for frontend.
the language used is JavaScript due to NodeJS.
So, you want to use the client's browser to render your frontend instead of embedding a browser inside it.
well, don't embed it!!
you can create a web application(e.g. opened by typing localhost:<port> in browser1) using NodeJS and handle your IPC(between frontend and backend) using ajax calls or a socket connection.
that way you are doing exactly what an electron app does, except that, electron uses a bundled browser.
now you made your app lighter, also if your client do have NodeJS installed, you don't need to bundle NodeJS!
--- inspired by jupyter notebooks ---
Possible Solutions
use NodeJS as backend.
use python and combine it with Flask or Django as backend. (I think this would be the most lightweight solution)
use PHP as backend. (the best, personal opinion)
use ASP.NET/Blazor as backend. (as mentioned in the comments; but doesn't seem to be a lightweight solution)
or use any language that you can create a web application with that!
make a runApp.bat or runApp.sh to simply run your server and open the browser automatically.

How to use Selenium WebDriver to find Firefox add-ons warning dialog

I am testing downloading and installing an add-on that our company makes. I can add the domain to the Firefox profile whitelist to eliminate the first dialog, but then FF displays a second one that says "Install add-ons only from authors whom you trust". I can't find a way for Selenium to find it. It's the one that looks like this:
I have tried driver.switchTo().alert().accept() - this is not an alert.
I have tried driver.switchTo().findElement(linkText("Install") - nothing found.
I have tried using SikuliWebDriver to find an element by location (picking some random ints to work off of) and then just send keys like Keys.TAB and Keys.ENTER, but as I step through in debug mode, driver.findELementByLocation(20,40) never returns.
I have tried driver.getKeyboard().sendKeys(Keys.TAB) (sending two tabs and an enter). Also never returns.
I think this dialog is generated by Javascript but I am unable to find out what JS generates it. Ideally I could find a name or an id for the button in the dialog and then use JavascriptExecutor to run the command. But without any kind of handle I'm stuck.
Any ideas?
Selenium can only see the DOM (document object model). It can't test desktop applications. The dialog shown is part of the Firefox application and not part of the DOM, so Selenium can't see or access or interact with it. Sad but true. Try Ranorex?

How do I make selenium see the network requests made by a web browser?

I have a dotnet Selenium web driver app.
When I'm testing the page one of the things I need to confirm is that a flash object on the page has pulled correct content from a content store on my site. (i.e. the flash object should be loading content from /stuff/info.txt and including that content within the animation.)
As a human looking at this I can use the chrome network tab and see that /stuff/info.txt has been accessed.
How can I make Selenium execute a similar watch and see the network requests made by a web browser?
I did not wrote this, neither tested it however someone did it here: http://www.softwareishard.com/blog/firebug/automate-page-load-performance-testing-with-firebug-and-selenium/
Basically all the requests are exported via netexport and firebug plugins inside a HAR (Http ARchive file)
Please give us your feedback if you give it a try!
Cheers !
I assume you want to automate the process which the developer tools of browsers does. Something like firebug but for verification using Code.
I don't believe Selenium has such features. For now, you will not be able to achieve this.

Scrapy or Selenium or Mechanize to scrape web data?

I want to scrape some data from a website.
Basically, the website has some tabular display and shows around 50 records. For more records, the user has to click some button which makes an ajax call get & show the next 50 records.
I have previous knowledge of Selenium webdriver(Python). I can do this very quickly in Selenium. But, Selenium is more kind of automation testing tool and it is very slow.
I did some R&D and found that using Scrapy or Mechanize, I can also do the same thing.
Should I go for Scrapy or Mechanize or Selenium for this ?
I would recommend you to go with a combination of Mechanize and ExecJS (https://github.com/sstephenson/execjs) to execute any javascript requests you might come across. I have used those two gems in combination for quite some time now and they do a great job.
You should choose this instead of Selenium, because it it will be a lot faster compared to having to render the entire page in a headless browser.
Definitely I'd choose Scrapy. If you can't handle javascript you can try with Scrapy + splash.
Scrapy is by far the fastest tool for web scraping that I'm aware of.
Good luck!

Prompt user to install/view app on mobile site

Does there exist some kind of plugin or lightweight method of determining whether
A. A user is using a mobile device
B. The user has a particular app
C. The user does not have a particular app.
And depending on what criteria the user satisfies, display a prompt (modal, overlay, pop-up) that allows the user to view the app (if installed) or to install it (if they do not have it installed).
I realize "A" can be achieved by using media queries but I am not sure how to configure the others.
I've seen this done on many many sites so I know that it is not uncommon (view screenshot). Ideally I just want to implement some quick solution. I'm looking for something similar to "Hello Bar" for mobile only, I suppose.
Any help will be appreciated.
Example: http://i.imgur.com/VkWKu.png (the prompt at the top of the browser)
I ended up finding this:
http://developer.apple.com/library/ios/#documentation/AppleApplications/Reference/SafariWebContent/PromotingAppswithAppBanners/PromotingAppswithAppBanners.html
Which is exactly what I was looking for and will work in tandem with the other solutions.
I would try this approach if you really need to know if a user has your app installed.
When your app is installed and first run have it create a cookie. The only thing you have to remember to use is the CookieSyncManager because the set Cookies are stored in RAM and not storage, CookieSyncManager will sync these two.
CookieSyncManager.createInstance(context)
CookieSyncManager.getInstance().sync()
Once you've set the cookie you can then read the Cookie with the website, if its there show popup etc. Oh and only show this popup only if its a mobile device: http://www.quirksmode.org/js/detect.html
Android Developer On CookieSyncManager: http://developer.android.com/reference/android/webkit/CookieSyncManager.html
Bolg Post Explaining the Usage of the CookieSyncManager:
http://blog.tacticalnuclearstrike.com/2010/05/using-cookiesyncmanager/
I know how to do this with android not iOS or Windows...
There's no standard way to do this.
See the end of this post: http://blogs.msdn.com/b/ieinternals/archive/2011/07/14/url-protocols-application-protocols-and-asynchronous-pluggable-protocols-oh-my.aspx for one mechanism available to JavaScript in IE10.
IE10's Metro environment offers this: http://blogs.msdn.com/b/ie/archive/2011/10/20/connect-your-web-site-to-your-windows-8-app.aspx but I don't think that exists for the mobile browser.

Resources