I am doing an api call to a large database that sends back data in JSON format. Since the data is that big, the database sends the JSON data in separate batches, each batch containing an nextPageUrl: to the next batch. I want to loop/crawl through the batches, collect the URL of each batch, store them in a list, and then loop that list again to parse all the JSON data. Then populate my own SQLITE database with the parse results. However, I get this error message:
Traceback (most recent call last):
File "Database_download_v2.py", line 52, in <module>
if (len(json_dict['nextPageUrl']) > 0):
KeyError: 'nextPageUrl'
The code I use is:
load_page = requests.get(form_response_tree, headers=headers).content
page_decode = load_page.decode()
json_dict = json.loads(page_decode)
url_subseq_page = json_dict['nextPageUrl']
url_list = list()
url_list.append(url_subseq_page)
for all_pages in url_list:
load_page = requests.get(all_pages, headers=headers).content
page_decode = load_page.decode()
json_dict = json.loads(page_decode)
if (len(json_dict['nextPageUrl']) > 0):
url_subseq_page = json_dict['nextPageUrl']
url_list.append(url_subseq_page)
else:
continue
Any idea what is wrong here?
It is because you don't have anything in the list so it will technically not work.
Solution
url_list = ["putstringsinthislist"]
Related
I am trying to iterate through all the cells of a CSV row ( name, screen_name and image url). Different errors show up, I tried with pandas but still I am unable to finish the job. My CSV looks like this:
screen_name,name,image_url_https
Jan,Jan,https://twimg.com/sticky/default_profile_images/default_profile_normal.png
greg,Gregory Kara,https://twimg.com/profile_images/60709109/Ferran_Adria_normal.jpg
hillheadshow,Hillhead 2020,https://twimg.com/profile_images/1192061150915178496/cF6jOCRV_normal.jpg
hectaresbe,Hectaresbe,https://twimg.com/profile_images/1190957150996226048/lJnRnFwi_normal.jpg
Sdzz,Sanne,https://twimg.com/profile_images/1159005129879801856/8p6KC1ei_normal.jpg
and the part of the code that I need to change is:
import json
import time,os
import pandas as pd
screen_name = 'mylist'
file = pd.read_csv("news2.csv", header=0)
col = file.head(0)
columns = list(col)
fp=codecs.open(screen_name+'.csv','w',encoding="utf-8")
i=0
while True:
try:
i+=1
print (i)
name=['name']
uname=['screen_name']
urlimage=['image_url_https']
The values are ok with #Snake_py code, next i am doing a request:
myrequest='https://requesturl.com/'+uname
#print(myrequest)
resp=requests.get(myrequest)
I get the following error:
raise InvalidSchema("No connection adapters were found for '%s'" % url)requests.exceptions.InvalidSchema: No connection adapters were found for '0 https://requesturl.com/Jan
Name: name, dtype: object'
timeout error caught.
the easiest way to iterate through a csv with Python would be:
name = []
uname = []
urlimage =[]
open with ('url', 'r') as file:
for row in file:
row = row.strip().split(";")
name.append(row[0])
uname.append(row[1])
urlimage.append(row[2])
print(name)
print(uname)
print(urlimage)
First I created three empty arrays. then I open the file and iterate over each row in the file. Row will be returned as an array. So you can run the normal index command [] to get the needed part of the array to append it to the empty list.
For the method above you might run into an encoding problem, so I would recommend method 2, although you actually do not iterate over the rows then.
Alternatively you could just do:
import pandas as pd
file = pd.read_csv("news2.csv", header=0)
name = file['name']
uname = file['uname']
urlimage = file['urlimage']
For the second method, you need to make sure that your header has the correct spelling
The Configuration model below allows for uploading of a .ZIP and ties the extracted network device .TXT files within to the location where the device configurations came from.
class Configuration(models.Model):
configfile = models.FileField('Configuration File Upload', upload_to='somecompany/configs/', help_text='Select a .ZIP file which contains .TXT file configuration dumps from devices which belong to a single location.')
location_name = models.ForeignKey('Location', help_text='Associate the .ZIP file selected above to the location from which the device .TXT file configuration dumps were taken.')
I extended the default SAVE model method for the class to allow for processing of the .ZIP (code not shown for brevity). I've parsed the extracted .TXT files, collected all of the desired information into variables, and I'm trying to insert that information into my database but it's failing. Specifically, below I show an example of all of the values collected from a single one of the extracted .TXT files (modified slightly for privacy) and my attempt at DB insertion:
dbadd_ln = 'Red Rock'
dbadd_dn = 'DEVICE4'
dbadd_manu = 'cisco'
dbadd_os = 'nxos'
dbadd_dt = '-'
dbadd_prot = '-'
dbadd_cred = '-'
dbadd_ser = 'ABCD1234'
dbadd_addr = '10.10.10.10'
dbadd_model = 'N7K-C7010'
dbadd_ram = '2048256000'
dbadd_flash = '1109663744/1853116416'
dbadd_image = 'n7000-s1-dk9.5.2.9.bin'
dbadd = Device(location_name=dbadd_ln, device_name=dbadd_dn, device_type=dbadd_dt, protocol=dbadd_prot, credential=dbadd_cred, serial=dbadd_ser, address=dbadd_addr, manufacturer=dbadd_manu, model=dbadd_model, ram=dbadd_ram, flash=dbadd_flash, os=dbadd_os, image=dbadd_image)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "c:\code-projects\MYVIRTUALENV\lib\site-packages\django\db\models\base.py", line 431, in __init__
setattr(self, field.name, rel_obj)
File "c:\code-projects\MYVIRTUALENV\lib\site-packages\django\db\models\fields\related_descriptors.py", line 207, in __set__
self.field.remote_field.model._meta.object_name,
ValueError: Cannot assign "'Red Rock'": "Device.location_name" must be a "Location" instance.
'Red Rock' is a legitimate Location entry which already exists in my database...
>>> Location.objects.filter(location_name='Red Rock')
[Location: Red Rock]
... so I guess I'm unclear on what this really means:
"Device.location_name" must be a "Location" instance.
Any assistance to help resolve this issue is appreciated. Thanks in advance.
Did some more searching and found this:
Cannot assign "u''": "Company.parent" must be a "Company" instance
Now I see. When I create an instance, I get past the particular error I was describing. Now on to solve the next problem. :)
I want to read a file having size 4 MB using python xlrd in GAE.
i am getting the file from Blobstore. Code used is given below.
book = xlrd.open_workbook(file_contents=temp_file)
sh = book.sheet_by_index(0)
for col_no in range(sh.ncols):
its gives me DeadlineExceededError.
book = xlrd.open_workbook(file_contents=file_data)
File "/base/data/home/apps/s~appid/app-version.369475363369053908/xlrd/__init__.py", line 416, in open_workbook
ragged_rows=ragged_rows,
File "/base/data/home/apps/s~appid/app-version.369475363369053908/xlrd/xlsx.py", line 756, in open_workbook_2007_xml
x12sheet.process_stream(zflo, heading)
File "/base/data/home/apps/s~appid/app-version.369475363369053908/xlrd/xlsx.py", line 520, in own_process_stream
for event, elem in ET.iterparse(stream):
DeadlineExceededError
But i am able to read files with smaller size.
Actually i need to get only first few rows(30 to 50) of the file. Is there any other method, other than adding it as a task and getting the details using channel API to get the details with out causing deadline error ?
What i can do to handle this....?
I read a file about 1000 rows excel and it works okay the library.
I leave a link that might be useful https://github.com/cjhendrix/HXLator-SpaceAppsVersion/blob/master/gae/main.py
the code I see that this crossing of columns and rows must be at lists for each row
example:
wb = xlrd.open_workbook(file_contents=inputfile.read())
sh = wb.sheet_by_index(0)
for rownum in range(sh.nrows):
val_row = sh.row_values(rownum)
#here print element of list
self.response.write(val_row[1]) #depending for number for columns
regards!!!
I have harvested data, its not particularly clean data, and this has been bulk uploaded into the DataStore. However I am getting the following issue when trying to simply loop through all the records. I don't much care about validation at this point as all I want to do is perform a bulk operation but GAE appears not to let me even loop through the data records. I want to get at the bottom of this. To my knowledge all records have data in the field for the country and I could switch of validation, but can someone explain why this is happening and GAE is being sensitive. Thanks
result = Company.all()
my_count = result.count()
if result:
for r in result:
self.response.out.write("hello")
The data model has these properties:
class Company(db.Model):
companyurl = db.LinkProperty(required=True)
companyname = db.StringProperty(required=True)
companydesc = db.TextProperty(required=True)
companyaddress = db.PostalAddressProperty(required=False)
companypostcode = db.StringProperty(required=False)
companyemail = db.EmailProperty(required=True)
companycountry = db.StringProperty(required=True)
The error message is below
Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py", line 701, in __call__
handler.get(*groups)
File "/base/data/home/apps/XXX/1.358667163009710608/showcompanies.py", line 99, in get
for r in result:
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/__init__.py", line 2312, in next
return self.__model_class.from_entity(self.__iterator.next())
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/__init__.py", line 1441, in from_entity
return cls(None, _from_entity=entity, **entity_values)
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/__init__.py", line 973, in __init__
prop.__set__(self, value)
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/__init__.py", line 613, in __set__
value = self.validate(value)
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/__init__.py", line 2815, in validate
value = super(StringProperty, self).validate(value)
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/__init__.py", line 640, in validate
raise BadValueError('Property %s is required' % self.name)
BadValueError: Property companycountry is required
If you have the bulk process you wish to run in its own script, you can construct a modified version of your Company class without validation. Since db.Model classes are just wrappers to the datastore based on the name of the class, you can have different classes in different parts of your code with different behaviors.
So you might have a model.py file with:
class Company(db.Model):
companyurl = db.LinkProperty(required=True)
# ...
companycountry = db.StringProperty(required=True)
# Normal operations go here
And, another bulk_process.py file with:
class Company(db.Model):
companyurl = db.LinkProperty()
# ...
companycountry = db.StringProperty()
result = Company.all()
my_count = result.count()
if result:
for r in result:
self.response.out.write("hello")
Because this second model class lacks the validation, it should run just fine. And, because the code is logically separated you don't have to worry about unintentional side-effects from removing the validation in the rest of your code. Just make sure that your bulk process doesn't accidentally write back data without validation (unless you're OK with this).
Following these instructions:
http://neogregious.blogspot.com/2011/04/migrating-app-to-high-replication.html
I have managed to migrate to the high replication datastore however I am now getting the following exception:
datastore_errors.BadArgumentError('ancestor argument should match app ("%r" != "%r")' %
(ancestor.app(), app))
The model data looks something like this:
class base_business(polymodel.PolyModel):
created = db.DateTimeProperty(auto_now_add=True)
class business(base_business):
some_data = db.StringProperty()
etc..
class business_image(db.Model):
image = db.BlobProperty(default=None)
mimetype = db.StringProperty()
comment = db.StringProperty(required=False)
# the image is associated like so
image_item = business_image(parent = business_item, etc... )
image_item.put()
The new app name has not been assigned to the ancestor model data. At the moment the data is returned however the logs are being populated with this exception message.
The actual stack trace using logging.exception:
2011-11-03 16:45:40.211
======= get_business_image exception [ancestor argument should match app ("'oldappname'" != "'s~newappname'")] =======
Traceback (most recent call last):
File "/base/data/home/apps/s~newappname/3.354412961756003398/oldappname/entities/views.py", line 82, in get_business_image
business_img = business_image.gql("WHERE ANCESTOR IS :ref_business and is_primary = True", ref_business = db.Key(business_key)).get()
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/init.py", line 2049, in get
results = self.fetch(1, config=config)
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/init.py", line 2102, in fetch
raw = raw_query.Get(limit, offset, config=config)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 1668, in Get
config=config, limit=limit, offset=offset, prefetch_size=limit))
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 1600, in GetBatcher
return self.GetQuery().run(_GetConnection(), query_options)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 1507, in GetQuery
order=self.GetOrder())
File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 93, in positional_wrapper
return wrapped(*args, **kwds)
File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_query.py", line 1722, in init
ancestor=ancestor)
File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 93, in positional_wrapper
return wrapped(*args, **kwds)
File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_query.py", line 1561, in init
(ancestor.app(), app))
BadArgumentError: ancestor argument should match app ("'oldappname'" != "'s~newappname'")
Is there a way to manually set the app on the model data? Could I do something like this to resolve this?
if( ancestor.app() != app )
set_app('my_app')
put()
Before I do this or apply any other HACK is there something I should have done as part of the data migration?
This sort of error usually occurs because you're using fully qualified keys somewhere that have been stored in the datastore as strings (instead of ReferenceProperty), or outside the datastore, such as in URLs. You should be able to work around this by reconstructing any keys from external sources such that you ignore the App ID, something like this:
my_key = db.Key.from_path(*db.Key(my_key).to_path())