Kameleo Automation - get a unique and random Chrome version when creating profiles - browser-automation

I'm using Kameleo's local-api-client-python with Selenium to automate browser profile generation and browser actions.
I noticed that when I automate the profiles are being created with the same chrome version, how do I get that to be unique and random?
I'm using the example code provided in the README but every time Chrome is created with the same version. I need different versions
from kameleo.local_api_client.kameleo_local_api_client import KameleoLocalApiClient
from kameleo.local_api_client.builder_for_create_profile import BuilderForCreateProfile
client = KameleoLocalApiClient()
base_profiles = client.search_base_profiles(
device_type='desktop',
browser_product='chrome'
)
# Create a new profile with recommended settings
# for browser fingerprinting protection
create_profile_request = BuilderForCreateProfile \
.for_base_profile(base_profiles[0].id) \
.set_recommended_defaults() \
.build()
profile = client.create_profile(body=create_profile_request)
# Start the browser
client.start_profile(profile.id)

When you call
base_profiles = client.search_base_profiles(
device_type='desktop',
browser_product='chrome'
)
it will return 25 base profiles for the given filtering criteria. The first couple of elements of the list will have latest version of Chrome, but if you pick another element from the list you can get older versions as well.
To get a profile with a random Chrome version each time, you can change the following code from this:
create_profile_request = BuilderForCreateProfile \
.for_base_profile(base_profiles[0].id) \
.set_recommended_defaults() \
.build()
to this:
create_profile_request = BuilderForCreateProfile \
.for_base_profile(random.choice(base_profile).id \
.set_recommended_defaults() \
.build()
Also import random at the TOP of your file

Related

How to make Selenium-Wire perform an indirect GraphQL AJAX request I expect and need?

Background story: I need to obtain the handles of the tagged Twitter users from an attached Twitter media. There's no current API method to do that unfortunately (see https://twittercommunity.com/t/how-to-get-tags-of-a-media-in-a-tweet/185614 and https://github.com/twitterdev/open-evolution/issues/34).
I have no other choice but to scrape, this is an example URL: https://twitter.com/justinwood_/status/1626275168157851650/media_tags. This is the page which pops up when you click on the tags link under the media of the parent Tweet: https://twitter.com/justinwood_/status/1626275168157851650/
The React generated DOM is deep and ugly, but would be scrapeable, however I do not want to log in with any account to get banned. Unfortunately when you visit https://twitter.com/justinwood_/status/1626275168157851650/media_tags in an Incognito window the popup shows up dead empty. However when I dig into the network requests the /TweetDetail GraphQL endpoint is full of messages about the anonymous page visit, fortunately it still contains the list of handles I need despite of all of this.
So what I need to have is a scraper which is able to process JavaScript, and capture the response for that specific GraphQL call. Selenium uses a headless Chrome under the hood, so it is able to process JavaScript, and Selenium-Wire offers the ability to capture the response.
Unfortunately my crafted Selenium-Wire script only has the TweetResultByRestId and UsersByRestId GraphQL requests but is missing the TweetDetail. I don't know what to tweak to make all the requests to happen. I iterated over a ton of Chrome options. Here is a variation of my script:
from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--headless") # for Jenkins
chrome_options.add_argument("--disable-dev-shm-usage") # Jenkins
chrome_options.add_argument('--start-maximized')
chrome_options.add_argument('--window-size=1900,1080')
chrome_options.add_argument('--ignore-certificate-errors-spki-list')
chrome_options.add_argument('--ignore-ssl-errors')
selenium_options = {
'request_storage_base_dir': '/tmp', # Use /tmp to store captured data
'exclude_hosts': ''
}
ser = Service('/usr/bin/chromedriver')
ser.service_args=["--verbose", "--log-path=test.log"]
driver = webdriver.Chrome(service=ser, options=chrome_options, seleniumwire_options=selenium_options)
tweet_id = "1626275168157851650"
twitter_media_url = f"https://twitter.com/justinwood_/status/{tweet_id}/media_tags"
driver.get(twitter_media_url)
driver.wait_for_request("/TweetDetail", timeout=10)
Any ideas?
Apparently it looks like I'd rather need to scrape the parent Tweet URL https://twitter.com/justinwood_/status/1626275168157851650/ and right now it seems my craved GraphQL call happens. Probably I got confused while trying 100 combinations.

Unable to get Selenium Code Working to POST data to Court (PACER) Practice/Training Site

I have been trying to create a script to do the following while I have access to a Training Web-site I am using for electronic case filings. This requires the following steps:
Going to the following site: https://ecf-train.nvb.uscourts.gov/
clicking the html link at this site on this link: "District of Nevada Train Database - Document Filing System"
This then redirects me to this site, where I POST my login credentials (username/pw/testclientcode/flag=1): (https://train-login.uscourts.gov/csologin/login.jsf?pscCourtId=NVTBK&appurl=https://ecf-train.nvb.uscourts.gov/cgi-bin/login.pl)
the "flag" checks a "redaction" clicks in and then is supposed to land me or give me the main access page to select what I want to e-file in the testing system (this is all approved by the Court for testing, FYI). I am attaching a screenshot of what it should look like when logged in successfully, so I can then click the "Bankruptcy" header link ECF Header Links, which then brings me to the next target page (https://ecf-train.nvb.uscourts.gov/cgi-bin/DisplayMenu.pl?BankruptcyEvents&id=1227536), where I can select the link for "Case Upload" (see attached screenshot for 'CaseUpload')Case Upload Selection Link
I am new to Python, but I believe I took all the correct steps to download the Selenium Chrome Browser and install Selenium using "pip install selenium". However, my code is not finding the webdriver "PATH", which is a local desktop directory saved directly on my "C:" drive.
Any help getting this work is HUGELY appreciated. Here is the code I have been using python with the errors as well:
CODE:
import requests
from selenium.webdriver.chrome.service import Service
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
import sys
import os
import json
import time`
# pass in header info which is "Content-Type:application/json"
url = "https://ecf-train.nvb.uscourts.gov/"
# payload contains the header information for Conent Type and Accept
payload = {"Content-Type":"application/json", "Accept":"application/json"}
r = requests.post(url, data=payload)
pacer_target = requests.get(url)
pacer_target.text
print(r.text)
# defining the api-endpoint
API_ENDPOINT = 'url'
requests.get(API_ENDPOINT.text)
# data to be sent to api - I removed by userid and pw from this post
data = {
"loginId":"XXXXXXXXX",
"password":"XXXXXXXXX",
"clientCode":"anycode",
"redactFlag":"1"}
# sending post request and saving response as response object
r = requests.post(url = API_ENDPOINT, data = data)
print(r.text, "STEP 1 - logged into main Traing screen!!")
# Import Selenium Executable File for Chrome from C:\\ drive on my desktop and use the location path
PATH = 'C:\chromedriver.exe'
driver = webdriver.Chrome(PATH)
# browser = webdriver.Chrome()
continue_link = browser.find_element(By.PARTIAL_LINK_TEXT, 'cgi-bin/login')
continue_link.click()
# get response from the clicked URL and print the response
response = requests.get(url)
print(url.text, "this is step 2 - just clicked the main login button!!!!!")
# sending post request and saving response as response object
r = requests.post(url = API_ENDPOINT, data = data)
# extracting response text
pastebin_url = r.text
print("The pastebin URL is:%s"%pastebin_url)

Python Selenium Failing to Acquire data

I am trying to download the 24-month data from www1.nseindia.com and it fails on Chrome and Firefox drivers. It just freezes after filling all the values in the required places and does not click. The webpage does not respond...
Below is the code that I am trying to execute:
import time
from selenium import webdriver
from selenium.webdriver.support.ui import Select
id_list = ['ACC', 'ADANIENT']
# Chrome
def EOD_data_Chrome():
driver = webdriver.Chrome(executable_path="C:\Py388\Test\chromedriver.exe")
driver.get('https://www1.nseindia.com/products/content/equities/equities/eq_security.htm')
s1= Select(driver.find_element_by_id('dataType'))
s1.select_by_value('priceVolume')
s2= Select(driver.find_element_by_id('series'))
s2.select_by_value('EQ')
s3= Select(driver.find_element_by_id('dateRange'))
s3.select_by_value('24month')
driver.find_element_by_name("symbol").send_keys("ACC")
driver.find_element_by_id("get").click()
time.sleep(9)
s6 = Select(driver.find_element_by_class_name("download-data-link"))
s6.click()
# FireFox(Gecko)
def EOD_data_Gecko():
driver = webdriver.Firefox(executable_path="C:\Py388\Test\geckodriver.exe")
driver.get('https://www1.nseindia.com/products/content/equities/equities/eq_security.htm')
s1= Select(driver.find_element_by_id('dataType'))
s1.select_by_value('priceVolume')
s2= Select(driver.find_element_by_id('series'))
s2.select_by_value('EQ')
s3= Select(driver.find_element_by_id('dateRange'))
s3.select_by_value('24month')
driver.find_element_by_name("symbol").send_keys("ACC")
driver.find_element_by_id("get").click()
time.sleep(9)
s6 = Select(driver.find_element_by_class_name("download-data-link"))
s6.click()
EOD_data_Gecko()
# Change the above final line to "EOD_data_Chrome()" and still it just remains stuck...
Kindly help with what is missing in that code to download the 24-month data... When I perform the same in a normal browser, with manual clicks, it is successful...
When you are manually doing it in a browser, you can change the values as below:
Set first drop down to : Security wise price volume data
"Enter Symbol" : ACC
"Select Series" : EQ
"Period" (radio button: "For Past") : 24 Months
Then click on the button, "Get Data", and in about 3-5seconds, the data loads, and then when you click on "Download file in CSV format", you can have the CSV file in your downloads
Need help using any library you know for scraping in Python: Selenium, Beautifulsoup, Requests, Scrappy, etc... Doesn't really matters unless it is python...
Edit: #Patrick Bormann, pls find the screenshot... The get data button works..
When you say that it works manually, have you try to simulate a click with action chains instead of the internal click function
from selenium.webdriver.common.action_chains import ActionChains
easy_apply = Select(driver.find_element_by_id('dateRange'))
actions = ActionChains(driver)
actions.move_to_element(easy_apply)
actions.click(easy_apply)
actions.perform()
and then you simulate a mouse movement to the specific value?
In addition, I tried it on my own and I didnt get any data when pushing on the button Get Data, as it seems to have a class of "get" as you mentioned, but this button doesnt work, but as you can see there exists a second button called full download, perhaps ypu try to use this one? Because the GetData Button doesnt work on Firefox and Chrome (when i tested it).
Did you already try to catch it through the link?
Update
As the OP asks for help in this urgent matter I delivered a working solution.
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time
from selenium.webdriver.support.ui import Select
chrome_driver_path = "../chromedriver.exe"
driver = webdriver.Chrome(executable_path=chrome_driver_path)
driver.get('https://www1.nseindia.com/products/content/equities/equities/eq_security.htm')
driver.execute_script("document.body.style.zoom='zoom 25%'")
time.sleep(2)
price_volume = driver.find_element_by_xpath('//*[#id="dataType"]/option[2]').click()
time.sleep(2)
date_range = driver.find_element_by_xpath('//*[#id="dateRange"]/option[8]').click()
time.sleep(2)
series = driver.find_element_by_name('series')
time.sleep(2)
drop = Select(series)
drop.select_by_value("EQ")
time.sleep(2)
driver.find_element_by_name("symbol").send_keys("ACC")
ez_download = driver.find_element_by_xpath('//*[#id="wrapper_btm"]/div[1]/div[3]/a')
actions = ActionChains(driver)
actions.move_to_element(ez_download)
actions.click(ez_download)
actions.perform()
Here you go, sorry, took a little, had to bring my son to bed...
This solution provides this output: I hope its correct. If you want to select other drop down menus you can change the string in the select (string because of too much indezes too handle) or the number in the xpath as the number highlights the index. The time is normally only for elements which need time to build themselves up on a webpage. But I made the experience that a too fast change sometimes causes errors. Feel free to change the time limit and see if it still works.
I hope you can now go on again in making some money for your living in India.
All the best Patrick,
Do not hesitate to ask if you have any questions.
UPDATE2
After one long night and another day we figured out that the Freezing originates from the website, as the website uses:
Boomerang | Akamai Developer developer.akamai.com/tools/… Boomerangis
a JavaScript library forReal User Monitoring (commonly called RUM).
Boomerang measures the performance characteristics of real-world page
loads and interactions. The documentation on this page is for mPulse’s
Boomerang. General API documentation for Boomerang can be found
atdocs.soasta.com/boomerang-api/.
.
What I discovered from the html header.
This is clearly a bot detection network/javascript. With the help of this SO post:
Can a website detect when you are using Selenium with chromedriver?
And the second paragraph from that post:https://piprogramming.org/articles/How-to-make-Selenium-undetectable-and-stealth--7-Ways-to-hide-your-Bot-Automation-from-Detection-0000000017.html
I finally solved the issue:
we changed the
var_key in chromedriver to something else like:
var key = '$dsjfgsdhfdshfsdiojisdjfdsb_';
In addition I changed the code to:
import time
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import Select
from selenium.webdriver.chrome.options import Options
options = webdriver.ChromeOptions()
chrome_driver_path = "../chromedriver.exe"
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
driver = webdriver.Chrome(executable_path=chrome_driver_path, options=options)
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.get('http://www1.nseindia.com/products/content/equities/equities/eq_security.htm')
driver.execute_script("document.body.style.zoom='zoom 25%'")
time.sleep(5)
price_volume = driver.find_element_by_xpath('//*[#id="dataType"]/option[2]').click()
time.sleep(3)
date_range = driver.find_element_by_xpath('//*[#id="dateRange"]/option[8]').click()
time.sleep(5)
series = driver.find_element_by_name('series')
time.sleep(3)
drop = Select(series)
drop.select_by_value("EQ")
time.sleep(4)
driver.find_element_by_name("symbol").send_keys("ACC")
actions = ActionChains(driver)
ez_download = driver.find_element_by_xpath('/html/body/div[2]/div[3]/div[2]/div[1]/div[3]/div/div[1]/form/div[2]/div[3]/p/img')
actions.move_to_element(ez_download)
actions.click(ez_download)
actions.perform()
#' essential because the button has to be loaded
time.sleep(5)
driver.find_element_by_class_name('download-data-link').click()
The code finally worked and the OP is happy.
I have edited the chromedriver.exe using hex editor and replaced cdc_ with dog_ and saved it. Then executed the below code using chrome driver.
import selenium
from selenium import webdriver
from selenium.webdriver.support.select import Select
import time
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("--disable-blink-features")
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options)
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
# Open the website
driver.get('https://www1.nseindia.com/products/content/equities/equities/eq_security.htm')
symbol_box = driver.find_element_by_id('symbol')
symbol_box.send_keys('20MICRONS')
driver.implicitly_wait(10)
#rd_period=driver.find_element_by_id('rdPeriod')
#rd_period.click()
list_daterange=driver.find_element_by_id('dateRange')
list_daterange=Select(list_daterange)
list_daterange.select_by_value('24month')
driver.implicitly_wait(10)
btn_getdata=driver.find_element_by_xpath('//*[#id="get"]')
btn_getdata.click()
driver.implicitly_wait(100)
print("Clicked button")
lnk_downloadData=driver.find_element_by_xpath('/html/body/div[2]/div[3]/div[2]/div[1]/div[3]/div/div[3]/div[1]/span[2]/a')
lnk_downloadData.click()
This code is working fine as of now. But the problem is that - this is not a permanent solution. NSE keeps on updating the logic to detect BOT execution in a better way. Like NSE, we will also have update our code. Please let me know if this code is not working. Will figure out some other solution.

How do I enable access-period on freeradius 3.0 for daloradius

I am trying to enable access-period on free radius 3.0,
I have added access-period module and authorize.
sqlcounter accessperiod {
counter-name = Max-Access-Period-Time
check-name = Access-Period
sqlmod-inst = sql
key = User-Name
reset = never
query = "SELECT UNIX_TIMESTAMP() - UNIX_TIMESTAMP(AcctStartTime) FROM radacct \
WHERE UserName = '%{%k}' LIMIT 1"
}
In the authorization part:
accessperiod
}
Loaded module rlm_sqlcounter
# Loading module "accessperiod" from file /etc/freeradius/3.0/mods-enabled/acessperiod
sqlcounter accessperiod {
/etc/freeradius/3.0/mods-enabled/acessperiod[1]: Configuration item "sql_module_instance" must have a value
/etc/freeradius/3.0/mods-enabled/acessperiod[1]: Invalid configuration for module "accessperiod"
It's simple.
Check freeradius 3 sqlcounter format. Then adjust sqlcounter from contrib directory to work with radius3. The counter format in contrib is for freeradius2
Other option: try to downgrade to freeradius 2 if you are able to do that.

Yocto CI Build number ? PR Service do not increment ${PR}

I'm trying to use the PR Service of Yocto (fido) but each time I launch bitbake on my recipe the package get the ${PR}=r0.
local.conf
INHERIT += "buildhistory"
BUILDHISTORY_COMMIT = "1"
PRSERV_HOST = "localhost:0"
recipe.bb
SRCREV = "${AUTOREV}"
BPV = "1.1.0"
PV = "${BPV}+gitr${SRCPV}" # I know, I should use a tag instead.
SRC_BRANCH = "master"
SRC_URI = "xxx.git;protocol=ssh;branch=${SRC_BRANCH}"
This produce a package with the name xxx_1.1.0+gitrAUTOINC+e7de1c757a-r0.0.
I was expecting to get
Build #1
xxx_1.1.0+gitr0+e7de1c757a-r0.0
Build #2
xxx_1.1.0+gitr1+e7de1c757a-r1.0
And so on...
I want to use the PR as the build number. Getting something like "1.1.0.453
Where "major.minor.revision.build-number"
I see two problems here:
The PR is not incremented, even if I change the recipe or the project source code.
The name of the package is not the one I'm expecting. Why there is a "r0" before the git and why revision is "r0.0" instead of "r0" ?
Best regards,
It's not expected to increment PR, it increments on EXTENDPRAUTO (which is used in PKGR after PR).
And it's also used in SRCPV to get always increasing number in front of the git hash (everytime the hash is changed to something PRSERV haven't seen for this recipe before it will return max+1).
And you shouldn't use tags in SRCREV, because bitbake will always run git ls-remote against the remote git repository to convert tag names to git sha (which breaks when you cannot connect to the git repository e.g. when disconnected from VPN and also significantly slows down the parsing of the recipes).

Resources