How to make Selenium-Wire perform an indirect GraphQL AJAX request I expect and need? - selenium-webdriver

Background story: I need to obtain the handles of the tagged Twitter users from an attached Twitter media. There's no current API method to do that unfortunately (see https://twittercommunity.com/t/how-to-get-tags-of-a-media-in-a-tweet/185614 and https://github.com/twitterdev/open-evolution/issues/34).
I have no other choice but to scrape, this is an example URL: https://twitter.com/justinwood_/status/1626275168157851650/media_tags. This is the page which pops up when you click on the tags link under the media of the parent Tweet: https://twitter.com/justinwood_/status/1626275168157851650/
The React generated DOM is deep and ugly, but would be scrapeable, however I do not want to log in with any account to get banned. Unfortunately when you visit https://twitter.com/justinwood_/status/1626275168157851650/media_tags in an Incognito window the popup shows up dead empty. However when I dig into the network requests the /TweetDetail GraphQL endpoint is full of messages about the anonymous page visit, fortunately it still contains the list of handles I need despite of all of this.
So what I need to have is a scraper which is able to process JavaScript, and capture the response for that specific GraphQL call. Selenium uses a headless Chrome under the hood, so it is able to process JavaScript, and Selenium-Wire offers the ability to capture the response.
Unfortunately my crafted Selenium-Wire script only has the TweetResultByRestId and UsersByRestId GraphQL requests but is missing the TweetDetail. I don't know what to tweak to make all the requests to happen. I iterated over a ton of Chrome options. Here is a variation of my script:
from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--headless") # for Jenkins
chrome_options.add_argument("--disable-dev-shm-usage") # Jenkins
chrome_options.add_argument('--start-maximized')
chrome_options.add_argument('--window-size=1900,1080')
chrome_options.add_argument('--ignore-certificate-errors-spki-list')
chrome_options.add_argument('--ignore-ssl-errors')
selenium_options = {
'request_storage_base_dir': '/tmp', # Use /tmp to store captured data
'exclude_hosts': ''
}
ser = Service('/usr/bin/chromedriver')
ser.service_args=["--verbose", "--log-path=test.log"]
driver = webdriver.Chrome(service=ser, options=chrome_options, seleniumwire_options=selenium_options)
tweet_id = "1626275168157851650"
twitter_media_url = f"https://twitter.com/justinwood_/status/{tweet_id}/media_tags"
driver.get(twitter_media_url)
driver.wait_for_request("/TweetDetail", timeout=10)
Any ideas?

Apparently it looks like I'd rather need to scrape the parent Tweet URL https://twitter.com/justinwood_/status/1626275168157851650/ and right now it seems my craved GraphQL call happens. Probably I got confused while trying 100 combinations.

Related

I am using AutoJsContext in geckobrowser with that I am using evaluate scrpit. But now i am using webview2 is there a how can i get this

I am using AutoJsContext in geckobrowser with that I am using evaluate scrpit and asble to get data. But now i am using webview2 is there a how can i get this.
Gecko browser:
using (AutoJSContext context = new AutoJSContext(browser.Window))
{
var userIdResult = context
.EvaluateScript("userId", (nsIDOMWindow)browser.Window.DomWindow);
}
Above is the code I use in gecko browser. now i need to get user id from webview2. Please help me in this issue
I am not able to get any alternative
You can use CoreWebView2.ExecuteScriptAsync to inject script into the current top-level document of the WebView2 and receive the result as a JSON string. The method doesn't have a way to specify an execution context. Script is executed in the global context of the top-level document similar to running script in the console of DevTools in the browser.
For example something like the following:
var resultAsJSON = await coreWebView2.ExecuteScriptAsync('window.userId');

Blocked a frame with origin "https://example.com" from accessing a frame with origin "https://www.herokucdn.com". Protocols, domains, and ports [duplicate]

I am loading an <iframe> in my HTML page and trying to access the elements within it using JavaScript, but when I try to execute my code, I get the following error:
SecurityError: Blocked a frame with origin "http://www.example.com" from accessing a cross-origin frame.
How can I access the elements in the frame?
I am using this code for testing, but in vain:
$(document).ready(function() {
var iframeWindow = document.getElementById("my-iframe-id").contentWindow;
iframeWindow.addEventListener("load", function() {
var doc = iframe.contentDocument || iframe.contentWindow.document;
var target = doc.getElementById("my-target-id");
target.innerHTML = "Found it!";
});
});
Same-origin policy
You can't access an <iframe> with different origin using JavaScript, it would be a huge security flaw if you could do it. For the same-origin policy browsers block scripts trying to access a frame with a different origin.
Origin is considered different if at least one of the following parts of the address isn't maintained:
protocol://hostname:port/...
Protocol, hostname and port must be the same of your domain if you want to access a frame.
NOTE: Internet Explorer is known to not strictly follow this rule, see here for details.
Examples
Here's what would happen trying to access the following URLs from http://www.example.com/home/index.html
URL RESULT
http://www.example.com/home/other.html -> Success
http://www.example.com/dir/inner/another.php -> Success
http://www.example.com:80 -> Success (default port for HTTP)
http://www.example.com:2251 -> Failure: different port
http://data.example.com/dir/other.html -> Failure: different hostname
https://www.example.com/home/index.html:80 -> Failure: different protocol
ftp://www.example.com:21 -> Failure: different protocol & port
https://google.com/search?q=james+bond -> Failure: different protocol, port & hostname
Workaround
Even though same-origin policy blocks scripts from accessing the content of sites with a different origin, if you own both the pages, you can work around this problem using window.postMessage and its relative message event to send messages between the two pages, like this:
In your main page:
const frame = document.getElementById('your-frame-id');
frame.contentWindow.postMessage(/*any variable or object here*/, 'https://your-second-site.example');
The second argument to postMessage() can be '*' to indicate no preference about the origin of the destination. A target origin should always be provided when possible, to avoid disclosing the data you send to any other site.
In your <iframe> (contained in the main page):
window.addEventListener('message', event => {
// IMPORTANT: check the origin of the data!
if (event.origin === 'https://your-first-site.example') {
// The data was sent from your site.
// Data sent with postMessage is stored in event.data:
console.log(event.data);
} else {
// The data was NOT sent from your site!
// Be careful! Do not use it. This else branch is
// here just for clarity, you usually shouldn't need it.
return;
}
});
This method can be applied in both directions, creating a listener in the main page too, and receiving responses from the frame. The same logic can also be implemented in pop-ups and basically any new window generated by the main page (e.g. using window.open()) as well, without any difference.
Disabling same-origin policy in your browser
There already are some good answers about this topic (I just found them googling), so, for the browsers where this is possible, I'll link the relative answer. However, please remember that disabling the same-origin policy will only affect your browser. Also, running a browser with same-origin security settings disabled grants any website access to cross-origin resources, so it's very unsafe and should NEVER be done if you do not know exactly what you are doing (e.g. development purposes).
Google Chrome
Mozilla Firefox
Safari
Opera: same as Chrome
Microsoft Edge: same as Chrome
Brave: same as Chrome
Microsoft Edge (old non-Chromium version): not possible
Microsoft Internet Explorer
Complementing Marco Bonelli's answer: the best current way of interacting between frames/iframes is using window.postMessage, supported by all browsers
Check the domain's web server for http://www.example.com configuration for X-Frame-Options
It is a security feature designed to prevent clickJacking attacks,
How Does clickJacking work?
The evil page looks exactly like the victim page.
Then it tricked users to enter their username and password.
Technically the evil has an iframe with the source to the victim page.
<html>
<iframe src='victim-domain.example'/>
<input id="username" type="text" style="display: none;"/>
<input id="password" type="text" style="display: none;"/>
<script>
//some JS code that click jacking the user username and input from inside the iframe...
<script/>
<html>
How the security feature work
If you want to prevent web server request to be rendered within an iframe add the x-frame-options
X-Frame-Options DENY
The options are:
SAMEORIGIN: allow only to my own domain render my HTML inside an iframe.
DENY: do not allow my HTML to be rendered inside any iframe
ALLOW-FROM https://example.com/: allow specific domain to render my HTML inside an iframe
This is IIS config example:
<httpProtocol>
<customHeaders>
<add name="X-Frame-Options" value="SAMEORIGIN" />
</customHeaders>
</httpProtocol>
The solution to the question
If the web server activated the security feature it may cause a client-side SecurityError as it should.
For me i wanted to implement a 2-way handshake, meaning:
- the parent window will load faster then the iframe
- the iframe should talk to the parent window as soon as its ready
- the parent is ready to receive the iframe message and replay
this code is used to set white label in the iframe using [CSS custom property]
code:
iframe
$(function() {
window.onload = function() {
// create listener
function receiveMessage(e) {
document.documentElement.style.setProperty('--header_bg', e.data.wl.header_bg);
document.documentElement.style.setProperty('--header_text', e.data.wl.header_text);
document.documentElement.style.setProperty('--button_bg', e.data.wl.button_bg);
//alert(e.data.data.header_bg);
}
window.addEventListener('message', receiveMessage);
// call parent
parent.postMessage("GetWhiteLabel","*");
}
});
parent
$(function() {
// create listener
var eventMethod = window.addEventListener ? "addEventListener" : "attachEvent";
var eventer = window[eventMethod];
var messageEvent = eventMethod == "attachEvent" ? "onmessage" : "message";
eventer(messageEvent, function (e) {
// replay to child (iframe)
document.getElementById('wrapper-iframe').contentWindow.postMessage(
{
event_id: 'white_label_message',
wl: {
header_bg: $('#Header').css('background-color'),
header_text: $('#Header .HoverMenu a').css('color'),
button_bg: $('#Header .HoverMenu a').css('background-color')
}
},
'*'
);
}, false);
});
naturally you can limit the origins and the text, this is easy-to-work-with code
i found this examlpe to be helpful:
[Cross-Domain Messaging With postMessage]
There is a workaround, actually, for specific scenarios.
If you have two processes running on the same domain but different ports, the two Windows can interact without limitations. (i.e. localhost:3000 & localhost:2000). To make this work, each window needs to change their domain to the shared origin:
document.domain = 'localhost'
This also works in the scenario that you are working with different subdomains on the same second-level domain, i.e. you are on john.site.example trying to access peter.site.example or just site.example
document.domain = 'site.example'
By explicitily setting document.domain; the browser will ignore the hostname difference and the Windows can be treated as coming from the 'same-origin'. Now, in a parent window, you can reach into the iframe: frame.contentWindow.document.body.classList.add('happyDev')
If you have control over the content of the iframe - that is, if it is merely loaded in a cross-origin setup such as on Amazon Mechanical Turk - you can circumvent this problem with the <body onload='my_func(my_arg)'> attribute for the inner html.
For example, for the inner html, use the this html parameter (yes - this is defined and it refers to the parent window of the inner body element):
<body onload='changeForm(this)'>
In the inner html :
function changeForm(window) {
console.log('inner window loaded: do whatever you want with the inner html');
window.document.getElementById('mturk_form').style.display = 'none';
</script>
I experienced this error when trying to embed an iframe and then opening the site with Brave. The error went away when I changed to "Shields Down" for the site in question. Obviously, this is not a full solution, since anyone else visiting the site with Brave will run into the same issue. To actually resolve it I would need to do one of the other things listed on this page. But at least I now know where the problem lies.
I would like to add Java Spring specific configuration that can effect on this.
In Web site or Gateway application there is a contentSecurityPolicy setting
in Spring you can find implementation of WebSecurityConfigurerAdapter sub class
contentSecurityPolicy("
script-src 'self' [URLDomain]/scripts ;
style-src 'self' [URLDomain]/styles;
frame-src 'self' [URLDomain]/frameUrl...
...
.referrerPolicy(ReferrerPolicyHeaderWriter.ReferrerPolicy.STRICT_ORIGIN_WHEN_CROSS_ORIGIN)
Browser will be blocked if you have not define safe external contenet here.
Open the start menu
Type windows+R or open "Run
Execute the following command.
chrome.exe --user-data-dir="C://Chrome dev session" --disable-web-security

How to get Hikvision DeepinViews license plate number from URL?

I cant find the solution anywhere and mine doesn't seem to work.
I just want to see the last plate string in the browser,or the few last plates,doesn't matter.
http://login:password#MY.IP/ISAPI/Traffic/channels/1/vehicleDetect/plates/
<AfterTime><picTime>2021-12-09T09:07:15Z</picTime></AfterTime>
I do have a plate taken exactly at the time im using in pictime,but the result im getting is;
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<ResponseStatus xmlns="
http://www.hikvision.com/ver20/XMLSchema
" version="2.0">
<requestURL>
/ISAPI/Traffic/channels/1/vehicleDetect/plates/
<AfterTime>
<picTime>2021-12-09T09:01:15Z</picTime>
</AfterTime>
</requestURL>
<statusCode>4</statusCode>
<statusString>Invalid Operation</statusString>
<subStatusCode>invalidOperation</subStatusCode>
</ResponseStatus>
POSTMAN
Edit:
Are you certain that the ISAPI setting is enabled in the camera configuration?
It's not possible in the browser without some tool to send and process your API request.
Have you tried using Postman?
Don't forget to use a Digest Auth header.
from requests.auth import HTTPDigestAuth
import requests
url = 'http://<Your IP>/ISAPI/Traffic/channels/1/vehicleDetect/plates/'
data = "<AfterTime><picTime>20220912T192011+0400</picTime></AfterTime>"
r=requests.get(url, data =data,auth=HTTPDigestAuth('admin', 'password'))
print(r.text)
Try this one after enabling this setting in camera
Screenshot

Error while using searchKitManager in React

I am trying to use the searchKitManager inside react-admin I provided the parameters etc according to the docs but when I run the code it throws errors. Here is how the code works
React Admin is running on http://localhost:3000
Golang backend is running on http://localhost:3006
ElasticSearch is running on http://localhost:9200
When data is inserted in mysql database using golang code it is also inserted in elasticsearch later on in one of my display component I call the above searchkitManager as follows
let apiUrl= 'http://localhost:9200/donate' // what link should I pass, url to elasticsearch or url to my backend
const searchkit = new SearchkitManager('/', {
searchUrlPath: `${apiUrl}/_search`,
});
This code will throw 404 Not Found or 400 Bad Request error but the API works in postman
if I change the above link to
let apiUrl= 'http://localhost:9200/donate' // what link should I pass, url to elasticsearch or url to my backend
const searchkit = new SearchkitManager('/', {
searchUrlPath: `${apiUrl}/_doc/`,
});
I am not getting anything at all sometimes it no error in console and sometimes 400 Bad Request or 405 Post Not Allowed
One last thing the link I am providing as for searchUrlPath should be like that or not? or should I pass the apiUrl in place of /? I tried that as well but just to make sure.
Any kind of help will be really appreciated.
Try doing this:
const elasticSearchUrl = 'http://localhost:9200/<your_index>';
const searchkit = new SearchkitManager(elasticSearchUrl );

Using bottle.py and blobstore GAE

I recently started using bottle and GAE blobstore and while I can upload the files to the blobstore I cannot seem to find a way to download them from the store.
I followed the examples from the documentation but was only successful on the uploading part. I cannot integrate the example in my app since I'm using a different framework from webapp/2.
How would I go about creating an upload handler and download handler so that I can get the key of the uploaded blob and store it in my data model and use it later in the download handler?
I tried using the BlobInfo.all() to create a query the blobstore but I'm not able to get the key name field value of the entity.
This is my first interaction with the blobstore so I wouldn't mind advice on a better approach to the problem.
For serving a blob I would recommend you to look at the source code of the BlobstoreDownloadHandler. It should be easy to port it to bottle, since there's nothing very specific about the framework.
Here is an example on how to use BlobInfo.all():
for info in blobstore.BlobInfo.all():
self.response.out.write('Name:%s Key: %s Size:%s Creation:%s ContentType:%s<br>' % (info.filename, info.key(), info.size, info.creation, info.content_type))
for downloads you only really need to generate a response that includes the header "X-AppEngine-BlobKey:[your blob_key]" along with everything else you need like a Content-Disposition header if desired. or if it's an image you should probably just use the high performance image serving api, generate a url and redirect to it.... done
for uploads, besides writing a handler for appengine to call once the upload is safely in blobstore (that's in the docs)
You need a way to find the blob info in the incoming request. I have no idea what the request looks like in bottle. The Blobstoreuploadhandler has a get_uploads method and there's really no reason it needs to be an instance method as far as I can tell. So here's an example generic implementation of it that expects a webob request. For bottle you would need to write something similar that is compatible with bottles request object.
def get_uploads(request, field_name=None):
"""Get uploads for this request.
Args:
field_name: Only select uploads that were sent as a specific field.
populate_post: Add the non blob fields to request.POST
Returns:
A list of BlobInfo records corresponding to each upload.
Empty list if there are no blob-info records for field_name.
stolen from the SDK since they only provide a way to get to this
crap through their crappy webapp framework
"""
if not getattr(request, "__uploads", None):
request.__uploads = {}
for key, value in request.params.items():
if isinstance(value, cgi.FieldStorage):
if 'blob-key' in value.type_options:
request.__uploads.setdefault(key, []).append(
blobstore.parse_blob_info(value))
if field_name:
try:
return list(request.__uploads[field_name])
except KeyError:
return []
else:
results = []
for uploads in request.__uploads.itervalues():
results += uploads
return results
For anyone looking for this answer in future, to do this you need bottle (d'oh!) and defnull's multipart module.
Since creating upload URLs is generally simple enough and as per GAE docs, I'll just cover the upload handler.
from bottle import request
from multipart import parse_options_header
from google.appengine.ext.blobstore import BlobInfo
def get_blob_info(field_name):
try:
field = request.files[field_name]
except KeyError:
# Maybe form isn't multipart or file wasn't uploaded, or some such error
return None
blob_data = parse_options_header(field.content_type)[1]
try:
return BlobInfo.get(blob_data['blob-key'])
except KeyError:
# Malformed request? Wrong field name?
return None
Sorry if there are any errors in the code, it's off the top of my head.

Resources