PhpPhantom request returns '405 Not Allowed'. Any way to still scrape the data? - screen-scraping

I am trying to scrape data from site which returns 405 not allowed and also load content using AJAX. Is there a way I can still scrape data using any method?

I solved this by using following:
$chrome_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36';
$firefox_agent = 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0';
$ie_agent = 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko';
$edge_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Edge/15.15063';
$agents = array($chrome_agent, $firefox_agent, $edge_agent, $ie_agent);
$user_agent = array_rand(array_flip($agents));
Ref: https://github.com/jonnnnyw/php-phantomjs/issues/208

Related

React Nginx Proxy Pass every file loading index.html

I've been scouring trying to find a solution, but when I go to my domain, all my static files are just returning index.html, giving me a
Uncaught SyntaxError: Unexpected token '<' error
My setup is this:
A server that runs nginx for multiple domains, with the one site in question having the following config file
server {
listen 80;
listen [::]:80;
server_name domain.com www.domain.com;
return 302 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
ssl_certificate /etc/ssl/domain/cert.pem;
ssl_certificate_key /etc/ssl/domain/key.pem;
server_name domain.com www.domain.com;
location / {
proxy_pass http://10.0.0.41:80;
}
location /api {
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-Server $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_pass http://10.0.0.41:3000;
where 10.0.0.41 is another server which is hosting 2 docker containers, one for my react app/nginx and one for my express backend app.
The reverse proxy to the express app works perfect, but when I try to visit domain.com, my js files are returning as index.html and not loading, giving me the error above.
When I visit 10.0.0.41 in my browser, everything loads as it should and works correctly, just not when coming from the domain.
this is the nginx config file for the second server
server {
listen 80;
listen [::]:80;
root /usr/share/nginx/html;
location / {
try_files $uri /index.html;
}
location ~ .(static)/(js|css|media)/(.+)$ {
try_files $uri $uri/ /$1/$2/$3;
}
}
I've tried everything I can find, adding that last line in the 2nd nginx config, removing, adding, changing homepage in package.json.
I am using React router, with <Route exact path="/"
This has been driving me crazy and any help would be greatly appreciated, and if I left out any important information let me know.
Difference in requests:
10.0.0.41:80
10.0.0.142 - - [02/Aug/2022:13:42:53 +0000] "GET / HTTP/1.1" 200 644 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" "-"
10.0.0.142 - - [02/Aug/2022:13:42:53 +0000] "GET /static/css/main.69847ccd.css HTTP/1.1" 200 2261 "http://10.0.0.41/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" "-"
10.0.0.142 - - [02/Aug/2022:13:42:53 +0000] "GET /static/js/main.10f72de5.js HTTP/1.1" 200 514201 "http://10.0.0.41/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" "-"
10.0.0.142 - - [02/Aug/2022:13:42:53 +0000] "GET /static/js/423.0a0d8ebb.chunk.js HTTP/1.1" 200 3280 "http://10.0.0.41/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" "-"
10.0.0.142 - - [02/Aug/2022:13:42:53 +0000] "GET /favicon.ico HTTP/1.1" 200 3150 "http://10.0.0.41/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" "-"
www.domain.com
10.0.0.101 - - [02/Aug/2022:13:44:37 +0000] "GET / HTTP/1.1" 200 644 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" "my_ip, cloudflare_ip"
10.0.0.101 - - [02/Aug/2022:13:44:37 +0000] "GET /manifest.json HTTP/1.1" 304 0 "https://www.my_domain.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" "my_ip, cloudflare_ip"
10.0.0.142 is the machine i'm testing with, 10.0.0.101 is the first nginx server

Xamarin WebView request desktop site

Is there a way to ask, via C#, the iOS and Android WebView components to request the desktop sites?
You need to do this per platform
Android
In Android you have to implement a custom renderer. Add this into your Android code:
// this line directly ubleow usings, before namespace declaration
[assembly:ExportRenderer(typeof(WebView), typeof(DesktopWebViewRenderer))]
// this in your namespace
public class DesktopWebViewRenderer : WebViewRenderer
{
protected override void OnElementChanged(ElementChangedEventArgs<WebView> e)
{
base.OnElementChanged(e);
Control.Settings.UserAgentString = "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.4) Gecko/20100101 Firefox/4.0";
}
}
iOS
Xamarin Forms is using UIWebView, so you have to call
NSUserDefaults.StandardUserDefaults.RegisterDefaults(new NSDictionary("UserAgent",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A"));
some where in your startup code. E.g. in FinishedLaunching of your AppDelegate.
Set the user agent string appropriately. There is no way to do this directly in Xamarin Forms, you would need to write a custom renderer to do that.
iOS UIWebView
NSUserDefaults.StandardUserDefaults.RegisterDefaults(["UserAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A"]);
iOS9+ WKWebView
web.CustomUserAgent = #"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.99 Safari/537.36";
Android
string agent = "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.4) Gecko/20100101 Firefox/4.0";
web.Settings.UserAgentString = agent;

Why does my code uncomment this part of the code out

So I've been making a simple HTTP flooder with a Wget and a user agent but when flooding i realize this happens
125.27.78.172 - - [26/Apr/2016:12:38:45 -0500] "GET / HTTP/1.1" 403 4961 "-" "Wget"
And if you are confused about this error i asked my friend to flood my VPS and this happend
208.67.1.176 - - [26/Apr/2016:12:48:32 -0500] "GET / HTTP/1.0" 403 4961 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.85 Safari/537.36"
Why is is it not working with me code:
sprintf(command, "wget -O /tmp/fff --header="Accept: text/html" --user-agent="Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.85 Safari/537.36" http://208.67.1.176/ ");
if you're wondering how this is a flooder the main code loops it
Escape the string?
sprintf(command, "wget -O /tmp/fff --header=\"Accept: text/html\" --user-agent=\"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.85 Safari/537.36\" http://208.67.1.176/ ");

Apache Forbidden on Virtual Directory

I have been trying to configure my Apache to list the log files that are stored in /var/log/squid/ so that I can download them from the server to my local PC if required. I have configured the Alias as follows;
Alias /squid/ "/var/log/squid/"
<Directory "/var/log/squid/">
Options None
AllowOverride All
Order allow,deny
Allow from all
</Directory>
But I keep getting the 403 Forbidden error message when I try to browse to the directory.
Apache2 Log Files (CentOS)
172.16.200.132 - - [10/May/2014:14:34:01 +0100] "GET /squid/ HTTP/1.1" 403 288 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36"
172.16.200.132 - - [10/May/2014:14:34:01 +0100] "GET /squid/ HTTP/1.1" 403 288 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36"
172.16.200.132 - - [10/May/2014:14:34:01 +0100] "GET /squid/ HTTP/1.1" 403 288 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36"
172.16.200.132 - - [10/May/2014:14:34:01 +0100] "GET /squid/ HTTP/1.1" 403 288 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36"
172.16.200.132 - - [10/May/2014:14:34:01 +0100] "GET /squid/ HTTP/1.1" 403 288 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36"
172.16.200.132 - - [10/May/2014:14:34:01 +0100] "GET /squid/ HTTP/1.1" 403 288 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36"
172.16.200.132 - - [10/May/2014:14:34:01 +0100] "GET /squid/ HTTP/1.1" 403 288 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36"
Does anyone have any suggestions, I have tried changing the user Apache runs under to admin which is a user that has full access to the /var/log/squid/ directory but doesn't have access to any system services /etc/init.d/ or the ability to run commands such as (halt, service).
Any suggestion please.
Error Log from Apache
[Mon May 12 17:53:11 2014] [error] [client 172.16.200.132] (13)Permission denied: Can't open directory for index: /var/log/squid/
From the log OP posted, the interesting part is )Permission denied: Can't open directory for index:.
To solve this, change line Options None to Options +Indexes
References: official documentation

Error 404 on uploading blob to production server on google app engine

I'm building a system where the users will upload articles to my app and I need to store them. I've read the tutorial about blobhandlers on Google's documentation and it worked, but only local.
When I test the app on development server, everything is fine, but on production server, I get Error 404 and the following logs:
2014-02-17 08:59:28.490 /http://ciro-app-id.appspot.com/_ah/upload/AMmfu6ah2vpKNsIDSzlpYPqAgnQ_zznnUwDweG571CgMMnGlluXc1GJS0i42UYYOKVZNQMBhzyY3grQFeCgD4hf4usx_YeMwy4n_93qM-QFegsMIFHDkNovRcJ9Rnl9li91bo4bdClfV/ALBNUaYAAAAAUwJCQ_kw2ANG1Tnvs9OIU6cAyOUDscqL/ 404 19ms 0kb Mozilla/5.0 (X11; Linux i686 (x86_64)) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36 module=default version=1
186.226.15.242 - - [17/Feb/2014:08:59:28 -0800] "POST /http://ciro-app-id.appspot.com/_ah/upload/AMmfu6ah2vpKNsIDSzlpYPqAgnQ_zznnUwDweG571CgMMnGlluXc1GJS0i42UYYOKVZNQMBhzyY3grQFeCgD4hf4usx_YeMwy4n_93qM-QFegsMIFHDkNovRcJ9Rnl9li91bo4bdClfV/ALBNUaYAAAAAUwJCQ_kw2ANG1Tnvs9OIU6cAyOUDscqL/ HTTP/1.1" 404 188 "http://ciro-app-id.appspot.com/enviar" "Mozilla/5.0 (X11; Linux i686 (x86_64)) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36" "ciro-app-id.appspot.com" ms=20 cpu_ms=0 cpm_usd=0.000021 app_engine_release=1.9.0 instance=00c61b117c6c9b0c25f5b86e2eadac83e2908691
Here is my code: https://drive.google.com/file/d/0B1-lpPH97tV2dzN6aURYVENCMzQ/edit?usp=sharing
Here is my app: ciro-app-id.appspot.com
Try it yourself
Login credentials:
Email: ciromoraismedeiros#gmail.com
Password: 123
Access ciro-app-id.appspot.com/enviar fill the form and submit it.
Obs.: I'm Brazillian, so everything is in portuguese language.
Notice the leading "/" in your request log? In /templates/enviar_artigo.html, change
<form action='/{{upload_url}}' ...>
to
<form action='{{upload_url}}' ...>

Resources