response status is not 200 on running webdriver to get url and on running beautiful soup to extract content, it throws attribute error - selenium-webdriver

I have been trying to web scrape hotel reviews but on multiple page jumps, the url of the webpage doesn't change. So I am using webdriver from selenium to work this out. It is not showing any error but on checking if the response status is 200, it is showing false. In addition to that, running the line of code which I have mentioned below generates an error. If anyone can fix the issue, effort will be highly appreciated!
!pip install selenium
from selenium import webdriver
import requests
from bs4 import BeautifulSoup
import pandas as pd
# install chromium, its driver, and selenium
!apt-get update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
# set options to be headless, ..
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# open it, go to a website, and get results
wd = webdriver.Chrome('chromedriver',options=options)
code = wd.get('https://www.goibibo.com/hotels/highland-park-hotel-in-trivandrum-1383427384655815037/?hquery={%22ci%22:%2220211209%22,%22co%22:%2220211210%22,%22r%22:%221-2-0%22,%22ibp%22:%22v15%22}&hmd=766931490eb7863d2f38f56c6185a1308de782c89dfeeea59d262b827ca15441bf50472cbfdc1ee84aeed8af756809a2e89cfd6eaea0fa308c1ca839e8c313d016ac0f5948658353cf30f1cd83050fd8e6adb2e55f2a5470cadeb0c28b7becc92ac44d81966b82408effde826d40fbff47525e09b5f145e321fe6d104e12933c066323798e33a911e0cbed7312fc1634f8f92fe502c8602556c9a02f34c047d04ff1400c995799156776c1a04e218d6486493edad5b0f7e51a5ea25f5f1cb4f5ed497ee9368137f6ec73b3b1166ee7c1a885920b90c98542e0270b4fa9004005cfe87a4d1efeaedc8e33a848f73345f09bec19153e8bf625cc7f9216e692a1bcc313e7f13a7fc091328b1fb43598bd236994fdc988ab35e70cf3a5d1856c0b0fa9794b23a1a958a5937ac6d258d121a75b7ce9fc70b9a820af43a8e9a3f279be65b5c6fbfff2ba20bfb0f3e3ee425f0b930bf671c50878a540c6a9003b197622b6ab22ae39e07b5174cb12bebbcd2a132bb8570e01b9e253c1bd83cb292de97a&cc=IN&reviewType=gi&vcid=3877384277955108166&srpFilters={%22type%22:[%22Hotel%22]}')
str(code) == "<Response [200]>"
**Output: ** False
soup = BeautifulSoup(code.content,'html.parser')
On running the below line of code, there comes an error:
AttributeError Traceback (most recent call
last) in () ----> 1 soup
= BeautifulSoup(code.content,'html.parser')
AttributeError: 'NoneType' object has no attribute 'content'

get()
get(url: str) loads a web page in the current browser session and doesn't returns anything.
Hence, as per your code, code will be always NULL.
Solution
To validate the Response you can adopt any of the two approaches:
Using requests.head():
import requests
request_response = requests.head(https://www.goibibo.com/hotels/highland-park-hotel-in-trivandrum-1383427384655815037/?hquery={%22ci%22:%2220211209%22,%22co%22:%2220211210%22,%22r%22:%221-2-0%22,%22ibp%22:%22v15%22}&hmd=766931490eb7863d2f38f56c6185a1308de782c89dfeeea59d262b827ca15441bf50472cbfdc1ee84aeed8af756809a2e89cfd6eaea0fa308c1ca839e8c313d016ac0f5948658353cf30f1cd83050fd8e6adb2e55f2a5470cadeb0c28b7becc92ac44d81966b82408effde826d40fbff47525e09b5f145e321fe6d104e12933c066323798e33a911e0cbed7312fc1634f8f92fe502c8602556c9a02f34c047d04ff1400c995799156776c1a04e218d6486493edad5b0f7e51a5ea25f5f1cb4f5ed497ee9368137f6ec73b3b1166ee7c1a885920b90c98542e0270b4fa9004005cfe87a4d1efeaedc8e33a848f73345f09bec19153e8bf625cc7f9216e692a1bcc313e7f13a7fc091328b1fb43598bd236994fdc988ab35e70cf3a5d1856c0b0fa9794b23a1a958a5937ac6d258d121a75b7ce9fc70b9a820af43a8e9a3f279be65b5c6fbfff2ba20bfb0f3e3ee425f0b930bf671c50878a540c6a9003b197622b6ab22ae39e07b5174cb12bebbcd2a132bb8570e01b9e253c1bd83cb292de97a&cc=IN&reviewType=gi&vcid=3877384277955108166&srpFilters={%22type%22:[%22Hotel%22]})
status_code = request_response.status_code
if status_code == 200:
print("URL is valid/up")
else:
print("URL is invalid/down")
Using urlopen():
import requests
import urllib
status_code = urllib.request.urlopen(https://www.goibibo.com/hotels/highland-park-hotel-in-trivandrum-1383427384655815037/?hquery={%22ci%22:%2220211209%22,%22co%22:%2220211210%22,%22r%22:%221-2-0%22,%22ibp%22:%22v15%22}&hmd=766931490eb7863d2f38f56c6185a1308de782c89dfeeea59d262b827ca15441bf50472cbfdc1ee84aeed8af756809a2e89cfd6eaea0fa308c1ca839e8c313d016ac0f5948658353cf30f1cd83050fd8e6adb2e55f2a5470cadeb0c28b7becc92ac44d81966b82408effde826d40fbff47525e09b5f145e321fe6d104e12933c066323798e33a911e0cbed7312fc1634f8f92fe502c8602556c9a02f34c047d04ff1400c995799156776c1a04e218d6486493edad5b0f7e51a5ea25f5f1cb4f5ed497ee9368137f6ec73b3b1166ee7c1a885920b90c98542e0270b4fa9004005cfe87a4d1efeaedc8e33a848f73345f09bec19153e8bf625cc7f9216e692a1bcc313e7f13a7fc091328b1fb43598bd236994fdc988ab35e70cf3a5d1856c0b0fa9794b23a1a958a5937ac6d258d121a75b7ce9fc70b9a820af43a8e9a3f279be65b5c6fbfff2ba20bfb0f3e3ee425f0b930bf671c50878a540c6a9003b197622b6ab22ae39e07b5174cb12bebbcd2a132bb8570e01b9e253c1bd83cb292de97a&cc=IN&reviewType=gi&vcid=3877384277955108166&srpFilters={%22type%22:[%22Hotel%22]}).getcode()
if status_code == 200:
print("URL is valid/up")
else:
print("URL is invalid/down")

Related

Unable to perform action chains with Appium Python Client

I have spent much time to find a way to perform right click with appium python client on a application in windows platform. Unfortunately, I get the following exception:
selenium.common.exceptions.WebDriverException: Message: Currently only pen and touch pointer input source types are supported
My code is in the following lines:
import unittest
from typing import List, Optional, Tuple, TypeVar
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
class SimpleCalculatorTests(unittest.TestCase):
#classmethod
def setUpClass(self):
# set up appium
desired_caps = {}
desired_caps["app"] = "Root"
desired_caps["deviceName"] = "WindowsPC"
desired_caps["platformName"] = "Windows"
self.driver = webdriver.Remote(
command_executor='http://127.0.0.1:4723/wd/hub',
desired_capabilities=desired_caps)
#classmethod
def tearDownClass(self):
self.driver.quit()
def test_initialize(self):
communation_object = self.driver.find_element(By.XPATH, "/Pane[#ClassName=\"#32769\"][#Name=\"Desktop 1\"]/Window[#Name=\"ETS5™ - Neues Projekt\"][#AutomationId=\"windowApplication\"]/Custom[#AutomationId=\"eTS4UC\"]/Custom[#ClassName=\"Workspace\"]/Custom[#ClassName=\"ContentPanelContainer\"]/Custom[#AutomationId=\"Control\"]/Custom[#ClassName=\"ActiveComObjectDetailView\"]/DataGrid[#ClassName=\"DataGrid\"]/DataItem[#ClassName=\"DataGridRow\"][#Name=\"21: Ventilausgang 1 (Bezeichnung) - Eingang - Stellgröße\"]/Custom[#ClassName=\"DataGridCell\"][#Name=\"Ventilausgang 1 (Bezeichnung) - Eingang\"]/Text[#ClassName=\"TextBlock\"][#Name=\"Ventilausgang 1 (Bezeichnung) - Eingang\"]")
actions = ActionChains(self.driver)
actions.context_click(communation_object)
actions.perform()
if __name__ == '__main__':
suite = unittest.TestLoader().loadTestsFromTestCase(SimpleCalculatorTests)
unittest.TextTestRunner(verbosity=2).run(suite)
Is there any alternative to perform right click?

mobile devices requests emulation using JSR223 in JMeter - No such property: driver for class

Scenario:
open main page and click on "Accept All Cookies" (JSR223 Sampler1 in Once Only controller);
open pages from the set of parametrized urls (JSR223 Sampler2 in another controller).
JSR223 Sampler1 code for main page:
import org.apache.jmeter.samplers.SampleResult; import
org.openqa.selenium.chrome.ChromeOptions; import
org.openqa.selenium.chrome.ChromeDriver; import
org.openqa.selenium.WebDriver; import org.openqa.selenium.By; import
org.openqa.selenium.WebElement; import
org.openqa.selenium.support.ui.ExpectedConditions; import
org.openqa.selenium.support.ui.WebDriverWait; import
java.util.concurrent.TimeUnit;
System.setProperty("webdriver.chrome.driver",
"vars.get("webdriver_path")");
Map<String, Object> mobileEmulation = new HashMap<>();
mobileEmulation.put("userAgent", "vars.get("userAgent")");
Map<String, Object> chromeOptions = new HashMap<>();
chromeOptions.put("mobileEmulation", mobileEmulation); ChromeOptions
options = new ChromeOptions();
options.setExperimentalOption("mobileEmulation", mobileEmulation);
ChromeDriver driver = new ChromeDriver(options);
driver.get("https://vars.get("main_page")"); WebDriverWait wait = new
WebDriverWait(driver, 20);
wait.until(ExpectedConditions.visibilityOfElementLocated(By.xpath("xpath")));
driver.findElement(By.xpath("xpath")).click();
log.info(driver.getTitle());
JSR223 Sampler2 code for any page from the set of urls:
driver.get("https://${url}");
Error message:
Response message:javax.script.ScriptException: groovy.lang.MissingPropertyException: No such property: driver for class
Problem:
If I just copy all code from JSR223 Sampler1 to JSR223 Sampler2 and change destination url, urls are opening, but in unproper way - launching each time new browser instance, and I can't have realistic response time (for driver.get("url") only), because result provides time of Sampler work, which includes driver initialization, new browser instance start and it takes several seconds...
Could you please propose any ideas, how is possible to resolve this problem? To get all requests in 1 browser instance and to have realistic response time for all requests in JSR223 Sampler2 for browser.get("url") only?
Will appreciate for any help.
In the first JSR223 Sampler you need to store your driver instance into JMeter Variables like:
vars.putObject("driver", driver)
it should be the last line of your script
In the second JSR223 Sampler you need to get the driver instance from JMeter Variables like:
driver = vars.getObject("driver")
it should be the first line of your script
vars is the shorthand for JMeterVariables class instance, see the JavaDoc for all available functions and Top 8 JMeter Java Classes You Should Be Using with Groovy article for more information on JMeter API shorthands available for JSR223 Test Elements
P.S. the same approach with vars you should follow when executing driver.get() function like:
driver.get("https://" + vars.get("url"))

In Selenium webdriver, for remote Firefox how to use OSS bridge instead of w3c bridge for handshaking

I am using selenoid for remote browser testing in ruby.
In that I am using 'selenium-webdriver', 'capybara', 'rspec' for automation. And I am using attach_file method for uploading file to browser
I want to upload file on Firefox and Chrome browser but it raises error on both;
In chrome
Selenium::WebDriver::Error::UnknownCommandError: unknown command: unknown command: session/***8d32e045e3***/se/file
In firefox
unexpected token at 'HTTP method not allowed'
So After searching I found the solution for chrome which is to set w3c option false in caps['goog:chromeOptions'] > caps['goog:chromeOptions'] = {w3c: false}
So now chrome is using OSS bridge for handshaking but I don't know how to do it in Firefox. Similar solution is not available for Firefox.
My browser capabilities are following:
if ENV['BROWSER'] == 'firefox'
caps = Selenium::WebDriver::Remote::Capabilities.new
caps['browserName'] = 'firefox'
# caps['moz:firefoxOptions'] = {w3c: false} ## It is not working
else
caps = Selenium::WebDriver::Remote::Capabilities.new
caps["browserName"] = "chrome"
caps["version"] = "81.0"
caps['goog:chromeOptions'] = {w3c: false}
end
caps["enableVNC"] = true
caps["screenResolution"] = "1280x800"
caps['sessionTimeout'] = '15m'
Capybara.register_driver :selenium do |app|
Capybara::Selenium::Driver.new(app, browser: :remote,
:desired_capabilities => caps,
:url => ENV["REMOTE_URL"] || "http://*.*.*.*:4444/wd/hub"
)
end
Capybara.configure do |config|
config.default_driver = :selenium
end
I have found the problem. There is bug in selenium server which run on java so I have to change my selenium-webdriver gem version 3.142.7 and monkey-patch.
You can find more information here about the bug and resolution.
For now I have to change my gem and monkey patch the selenium-webdriver-3.142.7\lib\selenium\webdriver\remote\w3c\commands.rb file. check for below line which is on line no 150.
upload_file: [:post, 'session/:session_id/se/file']
and update it to
upload_file: [:post, 'session/:session_id/file']
i had a similar issue with rails 7. the issue is connected with the w3c standard. the core problem is that the webdriver for chrome uses a non-w3c standard url for handling file uploads. when uploading a file, the webdriver uses the /se/file url path to upload. this path is only supported by the selenium server. subsequently, the docker image provided by selenium works fine. yet, if we use chromedriver, the upload fails. more info.
we can solve this, by forcing the webdriver to use the standard-compliant url by overriding the :upload_file key in Selenium::WebDriver::Remote::Bridge::COMMANDS. since, the initialization of this the COMMANDS constant does not happen when the module is loaded, we can override the attach_file method to make sure the constant is set correctly. here the hacky code:
module Capybara::Node::Actions
alias_method :original_attach_file, :attach_file
def attach_file(*args, **kwargs)
implement_hacky_fix_for_file_uploads_with_chromedriver
original_attach_file(*args, **kwargs)
end
def implement_hacky_fix_for_file_uploads_with_chromedriver
return if #hacky_fix_implemented
original_verbose, $VERBOSE = $VERBOSE, nil # ignore warnings
cmds = Selenium::WebDriver::Remote::Bridge::COMMANDS.dup
cmds[:upload_file] = [:post, "session/:session_id/file"]
Selenium::WebDriver::Remote::Bridge.const_set(:COMMANDS, cmds)
$VERBOSE = original_verbose
#hacky_fix_implemented = true
end
end
In Firefox images we support /session/<id>/file API by adding Selenoid binary which emulates this API instead of Geckodriver (which does not implement it).

How to do a Post/Redirect/Get (PRG) in FastAPI?

I am trying to redirect from POST to GET. How to achieve this in FastAPI?
What did you try?
I have tried below with HTTP_302_FOUND, HTTP_303_SEE_OTHER as suggested from Issue#863#FastAPI: But Nothing Works!
It always shows INFO: "GET / HTTP/1.1" 405 Method Not Allowed
from fastapi import FastAPI
from starlette.responses import RedirectResponse
import os
from starlette.status import HTTP_302_FOUND,HTTP_303_SEE_OTHER
app = FastAPI()
#app.post("/")
async def login():
# HTTP_302_FOUND,HTTP_303_SEE_OTHER : None is working:(
return RedirectResponse(url="/ressource/1",status_code=HTTP_303_SEE_OTHER)
#app.get("/ressource/{r_id}")
async def get_ressource(r_id:str):
return {"r_id": r_id}
# tes is the filename(tes.py) and app is the FastAPI instance
if __name__ == '__main__':
os.system("uvicorn tes:app --host 0.0.0.0 --port 80")
You can also see this issue here at FastAPI BUGS Issues
I also ran into this and it was quite unexpected. I guess the RedirectResponse carries over the HTTP POST verb rather than becoming an HTTP GET. The issue covering this over on the FastAPI GitHub repo had a good fix:
POST endpoint
import starlette.status as status
#router.post('/account/register')
async def register_post():
# Implementation details ...
return fastapi.responses.RedirectResponse(
'/account',
status_code=status.HTTP_302_FOUND)
Basic redirect GET endpoint
#router.get('/account')
async def account():
# Implementation details ...
The important and non-obvious aspect here is setting status_code=status.HTTP_302_FOUND.
For more info on the 302 status code, check out https://httpstatuses.com/302 Specifically:
Note: For historical reasons, a user agent MAY change the request method from POST to GET for the subsequent request. If this behavior is undesired, the 307 Temporary Redirect status code can be used instead.
In this case, that verb change is exactly what we want.

Going Headless in Selenium with Chrome using Python Executing, But Throwing Errors

Based off the answer here I am trying to get Chrome to run headless in my script.
The snippet of code below is inside of a function called login() that logs into our ERP system:
if headless == True:
options = Options()
options.headless = True
#Load webdriver
driver = webdriver.Chrome(options=options, executable_path=r'C:/Users/d.kelly/Desktop/Python/chromedriver_win32/chromedriver.exe')
if headless == False:
driver = webdriver.Chrome('C:/Users/d.kelly/Desktop/Python/chromedriver_win32/chromedriver.exe')
window_before_login = driver.window_handles[0]
### Removed Code Block that fills out login form and clicks 'Login' button ###
# Switch to new window ERP (PLEX) launches and close original blank one no longer needed.
window_before_login = driver.window_handles[0]
window_title = driver.title
driver.switch_to.window(window_before_login)
driver.close()
driver.switch_to.window(driver.window_handles[0])
When I call my function like so:
login(headless=False)
It throws no errors and my entire script executes just fine.
When I call my function like this:
def login(headless=True)
I get the following errors:
DevTools listening on ws://127.0.0.1:57567/devtools/browser/69f9e357-dccf-4e38-8d6b-78030462379a
[0204/072436.206:INFO:CONSOLE(6)] "Error parsing a meta element's content: ';' is not a valid key-value pair separator. Please use ',' instead.", source: https://test.plexonline.com/modules/systemadministration/login/index.aspx? (6)
[0204/072437.699:INFO:CONSOLE(6)] "Error parsing a meta element's content: ';' is not a valid key-value pair separator. Please use ',' instead.", source: https://test.plexonline.com/Modules/SystemAdministration/Login/Index.aspx (6)
[0204/072437.722:INFO:CONSOLE(1)] "Scripts may close only the windows that were opened by it.", source: (1)
[0204/072441.162:INFO:CONSOLE(751)] "Synchronous XMLHttpRequest on the main thread is deprecated because of its detrimental effects to the end user's experience. For more help, check https://xhr.spec.whatwg.org/.", source: https://test.plexonline.com/Modules/scripts/ajax.js (751)
I am using Chrome Version 79.0.3945.130 (Official Build) (64-bit), Selenium 3.141.0, and Python 3.7.4.
Any ideas what I am doing wrong? Thank you!

Resources