Python 3.6 Image crwaling - selenium-webdriver

I am crawling images from the Google image Search
I tried
1.Open the Chrome Driver with Selenium
2.Scroll down to end
3.Get image URL with BeautifulSoup and save Image
But it was a problem because the image was too small
So I found that there is an original image src
It is in the src(ends with ".jpg") of the irc_mi image class
But I do not know how to pull it out
I tried using find_all as the class name but it failed.
What should I do?
here are source codes
def Remainder_All_ImagesURLs_Google(searchText):
def scroll_page():
for i in range(7):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
sleep(3)
def click_button():
more_imgs_button_xpath = "//*[#id='smb']"
element = driver.find_element_by_xpath(more_imgs_button_xpath)
element.click()
sleep(3)
def create_soup():
html_source = driver.page_source
soup = BeautifulSoup(html_source, 'html.parser')
return soup
def find_imgs():
soup = create_soup()
imgs_urls = []
for img in soup.find_all('img'):
try:
if img['src'].startswith('http'):
imgs_urls.append(img['src'])
except:
pass
return imgs_urls
driver = webdriver.Chrome('C:/chromedriver.exe')
driver.maximize_window()
sleep(2)
searchUrl = "https://www.google.com/search?q={}&site=webhp&tbm=isch".format(searchText)
driver.get(searchUrl)
try:
scroll_page()
click_button()
scroll_page()
except:
click_button()
scroll_page()
imgs_urls = find_imgs()
driver.close()
return(imgs_urls)
def download_image(url,filename):
full_name = str(filename) + ".jpg"
urllib.request.urlretrieve(url, 'C:/Python/Picture' + full_name)

the problem is beautiful soup wont find any source or href of image since its a java-script based function which returns source(src) hence my suggestion use selenium to click image tag and wait for image src and extract it
use
element=driver.find_element_by_class_name("some_class")
element.click()
then search for image src

Related

How can I get the first image of a thread using nextcord api?

I want to be able to get the first/last image of a thread in discord using nextcord.
I think I have to go through the channel/thread history but then how do I get the image out of the message? is an URL that I have to request.get?
if message.flags.has_thread:
async for m in message.thread.history(limit=100, oldest_first=True):
You can do like this as per your code:
if message.flags.has_thread:
async for m in message.thread.history(limit=100, oldest_first=True):
if m.attachments:
return m.attachments[0].url
first_image_url = await first_image_from_thread(channel_id, thread_id)
if first_image_url:
response = requests.get(first_image_url)
with open('first_image.jpg', 'wb') as f:
f.write(response.content)
print("Downloaded first image from thread...")
else:
print("No image found in thread...")
In this, first_image_from_thread will be your function where you will pass parameter of channel_id & thread_id.
You can also iterate through the attachments and check if it's image with it's file type like attachment.file_type.endswith(".jpeg"). If it's image you seeing then use "url" property to get image url and use "requests" module to download the image.

how to use web scraped data to webpage

I saw codes and I would like to try/implement it to a web page to put some datas. (you can see it on the image below)
[enter image description here][1]
import requests
from bs4 import BeautifulSoup
import time
def get_count():
url = "https://data-live.flightradar24.com/zones/fcgi/feed.js?bounds=59.09,52.64,-58.77,-47.71&faa=1&mlat=1&flarm=1&adsb=1&gnd=1&air=1&vehicles=1&estimated=1&maxage=7200&gliders=1&stats=1"
# Request with fake header, otherwise you will get an 403 HTTP error
r = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
# Parse the JSON
data = r.json()
counter = 0
# Iterate over the elements to get the number of total flights
for element in data["stats"]["total"]:
counter += data["stats"]["total"][element]
return counter
while True:
print(get_count())
time.sleep(8)
I am just starting to learn web scraping with beautiful soup.

Serve image from GAE datastore with Flask (python)

I'd like to avoid using Webapp from GAE, so i use this code to upload an image to the Blobstore (code snippet from : http://flask.pocoo.org/mailinglist/archive/2011/1/8/app-engine-blobstore/#7fd7aa9a5c82a6d2bf78ccd25084ac3b)
#app.route("/upload", methods=['POST'])
def upload():
if request.method == 'POST':
f = request.files['file']
header = f.headers['Content-Type']
parsed_header = parse_options_header(header)
blob_key = parsed_header[1]['blob-key']
return blob_key
It returns what it seems to be indeed a Blobkey, wich is something like this :
2I9oX6J0U5nBCVw8kEndpw==
I then try to display the recently stored Blob image with this code :
#app.route("/testimgdisplay")
def test_img_display():
response = make_response(db.get("2I9oX6J0U5nBCVw8kEndpw=="))
response.headers['Content-Type'] = 'image/png'
return response
Sadly this part doesn't work, I got the following error :
BadKeyError: Invalid string key 2I9oX6J0U5nBCVw8kEndpw==
Do you guys have faced this error before ? It seems the Blobkey is well-formatted, and I can't find a clue.
There was a simple mistake on the call for getting the Blob, I wrote:
db.get("2I9oX6J0U5nBCVw8kEndpw==")
and the right call was instead:
blobstore.get("2I9oX6J0U5nBCVw8kEndpw==")
For those looking for a complete Upload/Serving image via GAE Blobstore and Flask without using Webapp, here is the complete code:
Render the template for the upload form:
#app.route("/upload")
def upload():
uploadUri = blobstore.create_upload_url('/submit')
return render_template('upload.html', uploadUri=uploadUri)
Place your uploadUri in the form path (html):
<form action="{{ uploadUri }}" method="POST" enctype="multipart/form-data">
Here is the function to handle the upload of the image (I return the blob_key for practical reasons, replace it with your template):
#app.route("/submit", methods=['POST'])
def submit():
if request.method == 'POST':
f = request.files['file']
header = f.headers['Content-Type']
parsed_header = parse_options_header(header)
blob_key = parsed_header[1]['blob-key']
return blob_key
Now say you serve your images with a path like this:
/img/imagefilename
Then your image serving function is :
#app.route("/img/<bkey>")
def img(bkey):
blob_info = blobstore.get(bkey)
response = make_response(blob_info.open().read())
response.headers['Content-Type'] = blob_info.content_type
return response
Finally, anywhere you need to display an image in a template, you simply put the code:
<img src="/img/{{ bkey }} />
I don't think Flask is any better or worse than Webapp in serving up Blobstore images, since they both use the Blobstore API for Serving a Blob.
What you're calling a Blobkey is just a string, which needs to be converted into a key (called resource here):
from google.appengine.ext import blobstore
from google.appengine.ext.webapp import blobstore_handlers
class ServeHandler(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
self.send_blob(blob_info)

gae-boilerplate existing blob not showing

Error does not display image with the following url
http://127.0.0.1:8080/serve/CrObzPCoJfjG4ESUUb0hjw==
Image does exist in the blobstore can be checked in admin
My route
Dope on redirect routes
RedirectRoute('/serve/[a-zA-Z0-9-_]', handlers.ServeHandler, name='ServeHandler'),
My code:
from google.appengine.ext import blobstore
from google.appengine.ext.webapp import blobstore_handlers
import urllib
class ServeHandler(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
logging.info("SERVE " + str(resource))
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
self.send_blob(blob_info)
class FetchHandler(blobstore_handlers.BlobstoreUploadHandler):
def post(self):
upload_files = self.get_uploads('file') # 'file' is file upload field in the form
blob_info = upload_files[0]
logging.info("FOUND blob info" + str(blob_info))
self.redirect('/serve/%s' % blob_info.key())
class ImageHandler(BaseHandler):
#user_required
def get(self, **kwargs):
user_session = self.user
user_session_object = self.auth.store.get_session(self.request)
upload_url = blobstore.create_upload_url('/fetch/')
user_info = models.User.get_by_id(long( self.user_id ))
user_info_object = self.auth.store.user_model.get_by_auth_token(
user_session['user_id'], user_session['token'])
try:
params = {
"upload_url": upload_url,
"user_session" : user_session,
"user_session_object" : user_session_object,
"user_info" : user_info,
"user_info_object" : user_info_object,
"userinfo_logout-url" : self.auth_config['logout_url'],
}
return self.render_template('image.html', **params)
except (AttributeError, KeyError), e:
return "Secure zone error:" + " %s." % e
I think your problem might be on this line:
self.redirect('/serve/%s' % blob_info.key())
According to the following recent changes assuming you did update appengine to the latest release:
The Blobstore service now returns the created filename instead of the blobKey when using Cloud Storage [link][1]
Have a look at the recent release notes and the changes that came with it.
I think they URL that you are providing to the create_upload_url is the wrong one, since you are defining it like /upload/.
Add the forward slash in the end and it should work:
upload_url = blobstore.create_upload_url('/upload/')
Comparing your code (form an earlier revision of your question) to some code I have working, I suspect that you might not want the trailing / on the /upload/ route (i.e., use /upload instead.
I'm not familiar with RedirectRoute, though.

Get url of Image in GAE (Python 2.7)

I am trying to get URL of Image(blob field of GAE):
class Product(db.Model):
name = db.StringProperty()
price = db.FloatProperty()
added = db.DateTimeProperty(auto_now_add=True)
image = db.BlobProperty(default=None)
url = images.get_serving_url(movie.image)
Handler of serve image:
def result(request):
product = Product()
product.name = "halva"
url = 'http://echealthinsurance.com/wp-content/uploads/2009/11/minnesota.jpg'
product.image = db.Blob(urlfetch.Fetch(url).content)
product.put()
template = loader.get_template("result.html")
context = RequestContext(request,
{
"result" : u"Add"})
return HttpResponse(template.render(context))
But i get except:
UnicodeDecodeError:
When try to ignore this exception(that was bug in Python 2.7) I get exception in other place.
And after that i try to encode Image to 'latin-1'('utf-8' don't work):
enc_img = movie.image.decode("latin-1")
url = images.get_serving_url(enc_img)
Result: url has a view like binary file:
"ÝêÓ9>èýÑNëCf Äàr0xã³3Ï^µ7±\íQÀ¡>.....ÕÝ£°Ëÿ"I¢¶L`ù¥ºûMþÒ¸ÿ+ÿL¢ï£ÿÙ' alt="" />"
How I get url to show dynamic image in template?
You are confusing two different things here.
If you are storing your image in a db.BlobProperty (code doesn't show you are doing this, but the Schema you have is using db.BlobProperty) this means your handler has to serve the image.
However you are using image.get_serving_url, which takes a BlobKey instance which comes from storing an Image in the BlobStore https://developers.google.com/appengine/docs/python/blobstore/blobkeyclass which is a completely different thing to what you are doing.
You will need to work out what you want to do, store an image (max size 1MB) in a BlobProperty and provide a handler that can serve the image, or upload it to the BlobStore and serve it from there
images.get_serving_url takes a BlobKey. Try:
enc_img = movie.image
url = images.get_serving_url(enc_img.key())

Resources