url fetch too many repeated redirects - google-app-engine

I am trying to load a url and I get this error:
DownloadError: ApplicationError: 2 Too many repeated redirects
This is the code I am using:
headers = { 'User-Agent' : 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; de-at) AppleWebKit/533.21.1 (KHTML, like Gecko) Version/5.0.5 Safari/533.21.1' }
url = "http://www.cafebonappetit.com/menu/your-cafe/collins-cmc/cafes/details/50/collins-bistro"
cmcHTM = urlfetch.fetch(url=url)
cmcHTML = str(cmcHTM.content)
I check the redirections of this website at: http://www.internetofficer.com/seo-tool/redirect-check/
and I found that this site is redirected to itself! So url fetch seems to be going in circles trying to load this page.
Meanwhile, this page loads just fine in my browser.
So I tried using this code:
cmcHTM = urlfetch.fetch(url=url,
follow_redirects=False,
deadline=100
)
This just returns nothing though. Is there any way of getting this html?!

Sorry for the delayed response. I found this that worked:
import urllib, urllib2, Cookie
from google.appengine.api import urlfetch
class URLOpener:
def __init__(self):
self.cookie = Cookie.SimpleCookie()
def open(self, url, data = None):
if data is None:
method = urlfetch.GET
else:
method = urlfetch.POST
while url is not None:
response = urlfetch.fetch(url=url,
payload=data,
method=method,
headers=self._getHeaders(self.cookie),
allow_truncated=False,
follow_redirects=False,
deadline=10
)
data = None # Next request will be a get, so no need to send the data again.
method = urlfetch.GET
self.cookie.load(response.headers.get('set-cookie', '')) # Load the cookies from the response
url = response.headers.get('location')
return response
def _getHeaders(self, cookie):
headers = {
'Host' : 'www.google.com',
'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)',
'Cookie' : self._makeCookieHeader(cookie)
}
return headers
def _makeCookieHeader(self, cookie):
cookieHeader = ""
for value in cookie.values():
cookieHeader += "%s=%s; " % (value.key, value.value)
return cookieHeader
I guess the key is the while loop - following the redirects based on the return header...

I think this is a problem in the site, not in your code. The site seems designed so it does a redirect to itself when it doesn't detect some header that is customarily sent by a browser. E.g. when I try accessing it with curl I get an empty body with a 302 redirect to itself, but in the browser I get a page. You'd have to ask the site owner what they are checking for...

Related

Fill a form with a captcha solving script using requests

So basically the script running perfectly, I get the feedback that my Captcha has been solved. But the problem lies when the script enters the rest of the form's input.
Any idea where it is coming from?
import requests
from random import randint
from time import sleep
# Add these values
API_KEY = 'ApiKey' # Your 2captcha API KEY
site_key = 'SiteKey' # site-key, read the 2captcha docs on how to get this
url = 'https://site' # example url
proxy = 'proxy' # example proxy
proxy = {'http': 'http://' + proxy, 'https': 'https://' + proxy}
s = requests.Session()
# here we post site key to 2captcha to get captcha ID (and we parse it here too)
captcha_id = s.post("http://2captcha.com/in.php?key={}&method=userrecaptcha&googlekey={}&pageurl={}".format(API_KEY, site_key, url), proxies=proxy).text.split('|')[1]
# then we parse gresponse from 2captcha response
recaptcha_answer = s.get("http://2captcha.com/res.php?key={}&action=get&id={}".format(API_KEY, captcha_id), proxies=proxy).text
print("solving ref captcha...")
while 'CAPCHA_NOT_READY' in recaptcha_answer:
sleep(5)
recaptcha_answer = s.get("http://2captcha.com/res.php?key={}&action=get&id={}".format(API_KEY, captcha_id), proxies=proxy).text
recaptcha_answer = recaptcha_answer.split('|')[1]
# we make the payload for the post data here, use something like mitmproxy or fiddler to see what is needed
payload = {
'username' : 'username',
'password' : 'password' ,
'password_again' : 'password' ,
'email' : 'email#gmail.com' ,
'key': 'value',
'gresponse': recaptcha_answer # This is the response from 2captcha, which is needed for the post request to go through.
}
# then send the post request to the url
response = s.post('https://site',payload, proxies=proxy)
You only need to change 'gresponse' to 'g-recaptcha-response' and it's working perfectly

GAE weapp2 Access-Control-Allow-Origin Error

I get this error while using webapp2 user authentication. No 'Access-Control-Allow-Origin' header is present on the requested resource.
How do I add access header before redirect ?
redirect code :
self.redirect(users.create_login_url(self.request.uri))
CODE:
class Authenticate(webapp2.RequestHandler):
def get(self):
user = users.get_current_user()
cookie_value = self.request.cookies.get('user')
if user==cookie_value and user!=None:
self.response.headers['Content-Type'] = 'text/plain'
self.response.write('Success')
else:
self.request.headers['Access-Control-Allow-Origin'] = '*'
self.redirect(users.create_login_url(self.request.uri))
def redirect(self, uri, permanent=False, abort=False, code=None,body=None)
and
def redirect(uri, permanent=False, abort=False, code=None,
body=None,request=None, response=None)
So, I think you should call the second one as (if you want to pass the response object):
return webapp2.redirect(users.create_login_url(self.request.uri),True,False,None
,None,None,self.response)
Refer Webapp2 Source Code
Before the self.redirect call, do
self.response.headers['Access-Control-Allow-Origin'] = '*'
or whatever value you wish to have for that header -- self.redirect by default makes a new response object, but you can change that with
return self.redirect(users.create_login_url(self.request.uri,
response=self.response)

Unable to download a document from google cloud storage

I am able to upload a document and download the document from google cloud storage for signed url using httpclient in java.But,when i put the same signed url in browser i am unable to download document for the link.I am getting following error
The request signature we calculated does not match the signature you
provided. Check your Google secret key and signing method.`
But when i mark check shared publicly check box in storage browser i am able to download from the generated signed url.But i want to allow a user to download a document from the browser without marking it as shared publicly.
.
I want to get confirm on some confusing part like
For document to get accessible by user who does not have google account after creating a signed url also i have to check shared publicly check box in storage browser?
But i think if the url is signed then it should not be check for shared publicly checkbox and user who does not have google account can access the document?But in my case it is not happening .According to link
https://developers.google.com/storage/docs/accesscontrol#About-CanonicalExtensionHeaders
it talks about Canonicalized_Extension_Headers .So i put in my request header
request.addHeader("x-goog-acl","public-read");
This is my code
// construct URL
String url = "https://storage.googleapis.com/" + bucket + filename +
"?GoogleAccessId=" + GOOGLE_ACCESS_ID +
"&Expires=" + expiration +
"&Signature=" + URLEncoder.encode(signature, "UTF-8");
System.out.println(url);
HttpClient client = new DefaultHttpClient();
HttpPut request = new HttpPut(url);
request.addHeader("Content-Type", contentType);
request.addHeader("x-goog-acl","public-read");// when i put this i get error
request.addHeader("Authorization","OAuth 1/zVNpoQNsOSxZKqOZgckhpQ");
request.setEntity(new ByteArrayEntity(data));
HttpResponse response = client.execute(request);
When i put request.addHeader("x-goog-acl","public-read");i get error
HTTP/1.1 403 Forbidden error .
.But when i remove this line it is uploaded successfully .It seems like i need to set
request.addHeader("x-goog-acl","public-read") to make it publicly accessible but on putting this on my code i am getting error.
.Any suggestion Please?
Finally Solved it.
To run singed url from browser you have to set HTTP header . In https://developers.google.com/storage/docs/accesscontrol#Construct-the-String
Content_Type Optional. If you provide this value the client (browser) must provide this HTTP header set to the same value.There is a word most.
So if you are providing Content_Type for sign string you must provide same Content_Type in browser http header.When i set Content_Type in browser header this error finally solved
this works for me:
set_include_path("../src/" . PATH_SEPARATOR . get_include_path());
require_once 'Google/Client.php';
function signed_storageURL($filename, $bucket, $p12_certificate_path, $access_id, $method = 'GET', $duration = 3600 )
{
$expires = time( ) + $duration*60;
$content_type = ($method == 'PUT') ? 'application/x-www-form-urlencoded' : '';
$to_sign = ($method."\n"."\n".$content_type."\n".$expires."\n".'/'.$bucket.'/'.$filename);
$signature = '';
$signer = new Google_Signer_P12(file_get_contents($p12_certificate_path), 'notasecret');
$signature = $signer->sign($to_sign);
$signature = urlencode( base64_encode( $signature ) );
return ('https://'.$bucket.'.commondatastorage.googleapis.com/'.$filename.'?GoogleAccessId='.$access_id.'&Expires='.$expires.'&Signature='.$signature);
}
$url = signed_storageURL(rawurlencode("áéíóú espaço & test - =.jpg"),'mybucket', 'mykey.p12','myaccount#developer.gserviceaccount.com');
echo ''.$url.'';

Basic Authentication in CakePHP

I am trying to setup Basic Authentication for my CakePHP app so I can use it as an API for an upcoming mobile application. However If I pass the following:
cameron:password#dev.driz.co.uk/basic/locked/
Where cameron is the username, password is the password, and the rest is the domain and application. locked is a method that requires authentication. (obviously the password is wrong in this example)
(Q1) I will be requested for a username and password in a prompt... but the username and password are in fact correct as if I then type them into the prompt they work... Why would this happen? Haven't I just passed the username and password?
I can't see anything wrong with the way I have set this up in CakePHP.
I set Basic Auth in AppController as:
public $components = array('Auth');
function beforeFilter()
{
parent::beforeFilter();
$this->Auth->authorize = array('Controller');
$this->Auth->authenticate = array('Basic');
$this->Auth->sessionKey = false;
$this->Auth->unauthorizedRedirect = false;
}
(Q2) Even so I have set both sessions to be false and the redirect to false, if the user cancels the prompt then they are redirected to the login page? Any ideas on how to stop this from happening? Ideally I want to send back a JSON response or status code of 401 (depending if it's an AJAX request or not).
So something like:
if ($this->request->is('ajax')) {
$response = json_encode(
array(
'meta'=>array(
'code'=>$this->response->statusCode(401),
'in'=>round(microtime(true) - TIME_START, 4)
),
'response'=>array(
'status'=>'error',
'message'=>'401 Not Authorized'
)
)
);
// Handle JSONP
if(isset($_GET['callback'])) {
$response = $_GET['callback'] . '(' . $response . ')';
}
// Return JSON
$this->autoRender = false;
$this->response->type('json');
$this->response->body($response);
} else {
header('HTTP/1.0 401 Unauthorized');
}
But where would this go in the application logic to show this? It needs to happen for ALL requested methods that require authentication and user fails or cancels the authentication.
(Q3) If you enter incorrect details you are just shown the prompt again until you get the username/password correct or hit cancel. How can I make it show an error?
Any ideas for these three issues (marked as sub questions numbers).
Update: This is how I send the headers to the API:
"use strict";jQuery.base64=(function($){var _PADCHAR="=",_ALPHA="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/",_VERSION="1.0";function _getbyte64(s,i){var idx=_ALPHA.indexOf(s.charAt(i));if(idx===-1){throw"Cannot decode base64"}return idx}function _decode(s){var pads=0,i,b10,imax=s.length,x=[];s=String(s);if(imax===0){return s}if(imax%4!==0){throw"Cannot decode base64"}if(s.charAt(imax-1)===_PADCHAR){pads=1;if(s.charAt(imax-2)===_PADCHAR){pads=2}imax-=4}for(i=0;i<imax;i+=4){b10=(_getbyte64(s,i)<<18)|(_getbyte64(s,i+1)<<12)|(_getbyte64(s,i+2)<<6)|_getbyte64(s,i+3);x.push(String.fromCharCode(b10>>16,(b10>>8)&255,b10&255))}switch(pads){case 1:b10=(_getbyte64(s,i)<<18)|(_getbyte64(s,i+1)<<12)|(_getbyte64(s,i+2)<<6);x.push(String.fromCharCode(b10>>16,(b10>>8)&255));break;case 2:b10=(_getbyte64(s,i)<<18)|(_getbyte64(s,i+1)<<12);x.push(String.fromCharCode(b10>>16));break}return x.join("")}function _getbyte(s,i){var x=s.charCodeAt(i);if(x>255){throw"INVALID_CHARACTER_ERR: DOM Exception 5"}return x}function _encode(s){if(arguments.length!==1){throw"SyntaxError: exactly one argument required"}s=String(s);var i,b10,x=[],imax=s.length-s.length%3;if(s.length===0){return s}for(i=0;i<imax;i+=3){b10=(_getbyte(s,i)<<16)|(_getbyte(s,i+1)<<8)|_getbyte(s,i+2);x.push(_ALPHA.charAt(b10>>18));x.push(_ALPHA.charAt((b10>>12)&63));x.push(_ALPHA.charAt((b10>>6)&63));x.push(_ALPHA.charAt(b10&63))}switch(s.length-imax){case 1:b10=_getbyte(s,i)<<16;x.push(_ALPHA.charAt(b10>>18)+_ALPHA.charAt((b10>>12)&63)+_PADCHAR+_PADCHAR);break;case 2:b10=(_getbyte(s,i)<<16)|(_getbyte(s,i+1)<<8);x.push(_ALPHA.charAt(b10>>18)+_ALPHA.charAt((b10>>12)&63)+_ALPHA.charAt((b10>>6)&63)+_PADCHAR);break}return x.join("")}return{decode:_decode,encode:_encode,VERSION:_VERSION}}(jQuery));
$(document).ready(function(){
var username = 'cameron';
var password = 'password';
$.ajax({
type: 'GET',
url: 'http://dev.driz.co.uk/basic/locked',
beforeSend : function(xhr) {
var base64 = $.base64.encode(username + ':' + password);
xhr.setRequestHeader("Authorization", "Basic " + base64);
},
dataType: 'jsonp',
success: function(data) {
console.log(data);
},
error: function(a,b,c) {
//console.log(a,b,c);
}
});
});
Q1
You don't specify how you visit the protected URL (dev.driz.co.uk/basic/locked). Are you sure that the way you are doing it you are setting up the request headers properly? You need to Base64 encode the username/password.
When your first request fails the browser jumps in with the prompt and to be succeeding means that the browser does it properly for you the second time.
Have a look at you request headers to see what you send the first time and what the browser sends the second.
Q2
When basic auth fails your server sends a 401 with a header WWW-Authenticate:Basic which is picked up from the browser and you are presented with the prompt. That is build in normal behavior for all browsers since ages, you can't change that.
About your issue with canceling and being redirected to login, Auth had some API changes after 2.4 that are highlighted in the book. Before version 2.4 you are always redirected to loginAction.
Finally, let Auth do the work for you by setting it up properly and don't attempt to hardwire the responses yourself like in the code you suggest. You also shouldn't ever be using php's header() in cakephp, use CakeRequest::header() instead.
Q3
Answered in Q2, you can't have Basic and 401 not trigger the prompt. Either change the required authentication header (by perhaps setting a name like Basic-x instead of Basic) or don't send the response code 401 on failure but send i.e. 200 or 400 and add an error message explaining the situation.

Fetching url in python with google app engine

I'm using this code to send Http request inside my app and then show the result:
def get(self):
url = "http://www.google.com/"
try:
result = urllib2.urlopen(url)
self.response.out.write(result)
except urllib2.URLError, e:
I expect to get the html code of google.com page, but I get this sign ">", what the wrong with that ?
Try using the urlfetch service instead of urllib2:
Import urlfetch:
from google.appengine.api import urlfetch
And this in your request handler:
def get(self):
try:
url = "http://www.google.com/"
result = urlfetch.fetch(url)
if result.status_code == 200:
self.response.out.write(result.content)
else:
self.response.out.write("Error: " + str(result.status_code))
except urlfetch.InvalidURLError:
self.response.out.write("URL is an empty string or obviously invalid")
except urlfetch.DownloadError:
self.response.out.write("Server cannot be contacted")
See this document for more detail.
You need to call the read() method to read the response. Also good practice to check the HTTP status, and close when your done.
Example:
url = "http://www.google.com/"
try:
response = urllib2.urlopen(url)
if response.code == 200:
html = response.read()
self.response.out.write(html)
else:
# handle
response.close()
except urllib2.URLError, e:
pass

Resources