Unicode fonts in pdf at GAE with web2py/pyfpdf - google-app-engine

I'm writing an app, which results with pdf file with some text with unicode characters. On GAE devserver it works good, but after deploy it can't import font file (crash after add_font() (pyfpdf)).
The code is:
# -*- coding: utf-8 -*-
def fun1():
from gluon.contrib.pyfpdf import FPDF, HTMLMixin
class MyFPDF(FPDF, HTMLMixin):
pass
pdf =MyFPDF()
pdf.add_font('DejaVu', '', 'DejaVuSansCondensed.ttf', uni=True)
pdf.add_page()
pdf.set_font('DejaVu','',16)
pdf.write(10,'test-ąśł')
response.headers['Content-Type']='application/pdf'
return pdf.output(dest='S')
The font files (with a file DejaVuSansCondensed.pkl generated after first run on web2py server...) is in /gluon/contrib/fpdf/font. I didn't add anything to routers.py (I'm using Pattern-based system) also app.yaml is not changed. And I get this:
In FILE: /base/data/home/apps/s~myapp/web2py-04.369240954601780983/applications/app3/controllers/default.py
Traceback (most recent call last):
File "/base/data/home/apps/s~myapp/web2py-04.369240954601780983/gluon/restricted.py", line 212, in restricted
exec ccode in environment
File "/base/data/home/apps/s~myapp/web2py-04.369240954601780983/applications/app3/controllers/default.py", line 674, in <module>
File "/base/data/home/apps/s~myapp/web2py-04.369240954601780983/gluon/globals.py", line 194, in <lambda>
self._caller = lambda f: f()
File "/base/data/home/apps/s~myapp/web2py-04.369240954601780983/applications/app3/controllers/default.py", line 493, in fun1
pdf.add_font('DejaVu', '', 'DejaVuSansCondensed.ttf', uni=True)
File "/base/data/home/apps/s~myapp/web2py-04.369240954601780983/gluon/contrib/fpdf/fpdf.py", line 432, in add_font
font_dict = pickle.load(fh)
File "/base/data/home/runtimes/python27p/python27_dist/lib/python2.7/pickle.py", line 1378, in load
return Unpickler(file).load()
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/pickle.py", line 966, in load_string
raise ValueError, "insecure string pickle"
ValueError: insecure string pickle
As I said on local (both web2py/rocket and gae) it works well. After deploy only something like this works:
pdf =MyFPDF()
pdf.add_page()
pdf.set_font('Arial','',16)
pdf.write(10,'testąśł')
But without "unusual" characters...
The best solution would be to add my font files (like DejaVu), but basically I need unicode characters in any font... maybe some "half-solution" to use "generic GAE unicode" fonts... if it exist something like this...

Thanks for suggestion Tim!
I found some solution... it isn't the best one, but it works...
The problem is with using pickle on GAE. The best solution (probably) would be to overload/rewrite the add_font() function where for GAE, in such a way, that it would write to a datastore instead of a filesystem. Additionaly ValueError: insecure string pickle error can still occur, I tried b64 encoding according to this. But still I get errors. So my solution is to overload add_font() function with commented out/deleted parts:
if os.path.exists(unifilename):
fh = open(unifilename)
try:
font_dict = pickle.load(fh)
finally:
fh.close()
else:
and
try:
fh = open(unifilename, "w")
pickle.dump(font_dict, fh)
fh.close()
except IOError, e:
if not e.errno == errno.EACCES:
raise # Not a permission error.
Because of this the function every time calculates little bit more instead of just reading data from the pickle... but it works on GAE.

Related

Trouble with running words through text files and counting them

Python 3+
This is the error i get
This is my code
I want the user to input some words, then the program should run each word through my two textfiles, if the word exists in any of them, I want the program to add +1 to the positive/negative count list.
Thank you for your help :)
Seems like you have stumbled upon a Decoding error when trying to open one of the input files in the wordlist function. it is usually hard to determine the encoding used for a particular file. so you could :
1.Try opening the file with a different encoding such as ISO-8859-15,etc.
def OpenFile():
try:
with open("My File.txt",mode="r",encoding="IS0-8859-15")
#do process My File
except UnicodeDecodeError:
print("Something went Wrong Try a different file encoding")
else:
#everything was okay, return the required
finally:
# clean up here
2. Look it modules that try and determine the correct encoding for the file such as the chardet module
Install the
chardet module :
sudo pip3 install chardet
you can run it at the command line with your file as the Argument to determine the encoding
cd /path/to/File/
chardetect My\ File.txt
this should return the likely encoding for the given file
3.You can use the chardet module inside your python code however this is recommended in a case where you will be opening a file you do not have access to e.g at a clients computer whom wants to open another specified file
and reopening the same file and redetecting the encoding will cause your program to be slow.
First of all positive_count and negative_count should be integers and not lists. If you wish to count, adding 1 to the list isn't really what you're trying to accomplish.
Second of all, the UnicodeDecodeError is there because the encoding of the underlying file is not utf-8. Did you try utf-16 or utf-16-le? In case you're using Windows, utf-16-le is probably the encoding used unless you're using code-points in which case guessing will be a nightmare.

Trying to upload compressed data (unicode) via bulkuploader

I ran into an issue where the data being uploaded to db.text was over 1 mb, so I compressed the information using zlib. Bulkloader by default didn't support the unicode data data being uploaded, so I switched out the source code to use unicodecsv rather than python's built in csv module. The problem that I'm running into is that Google App Engine's bulkload is unable to support the unicode characters (even though the db.Text entity is unicode).
[ERROR ] [Thread-12] DataSourceThread:
Traceback (most recent call last):
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 1611, in run
self.PerformWork()
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 1730, in PerformWork
for item in content_gen.Batches():
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 542, in Batches
self._ReadRows(key_start, key_end)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 452, in _ReadRows
row = self.reader.next()
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/bulkload/csv_connector.py", line 219, in generate_import_record
for input_dict in self.dict_generator:
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/unicodecsv/__init__.py", line 188, in next
row = csv.DictReader.next(self)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 108, in next
row = self.reader.next()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/unicodecsv/__init__.py", line 106, in next
row = self.reader.next()
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/bulkload/csv_connector.py", line 55, in utf8_recoder
for line in codecs.getreader(encoding)(stream):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 612, in next
line = self.readline()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 527, in readline
data = self.read(readsize, firstline=True)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 474, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c in position 29: invalid start byte
I know that for my local testing I could modify the python files to use unicodecsv's module instead but that doesn't help solve the problem for using GAE's Datastore on production. Is there an existing solution to this problem that anyone is aware of?
Solved this the other week, you just need to base64 encode the results so you won't have any issues with bulkloader size increases by 30-50% but since zlib already compressed my data to 10% of the original this wasn't too bad.

How to extract data from multiple files with Python?

I am new to Python, which is also my first programming language. I have a set of txt files (academic papers), I need to extract the paper ID (e.g. ID: a1111111) and abstract (e.g. ABSTRACT: .....). I have no idea how to extract this data from multiple files from multiple folders? Thanks A LOT!
So your question is two part: reading files and accessing folders
Reading files
The methods/objects in python used for reading files is in Python's documentation on chapter 7:
http://docs.python.org/2/tutorial/inputoutput.html
The basic gist is that you use the open method to access files that are in the same directory
f = open('stuff.txt', 'r')
Where stuff.txt is the name of the file in the same directory that your python file is in.
Calling print f.read() will display the text (in String format) of the file. Feel free to assign f.read() to a variable to capture the data.
>>> x = f.read()
>>> print x
This is the entire file.\n
Best read the documentation for all these methods, cause there are subtleties. For example, calling f.read() once will return the entire file contents to you, but calling f.read() again will return an empty string, as the "end of the file has been reached."
Accessing Folders
Can you explain to me how exactly you'd like to access folders? In this case, it would be much easier to just put all your files in the same directory as where you are running your python file.
However, the basic way to move around in python is to use: os.chdir(path) which is basically cd'ing around. You must import os before you use this.
Leave a comment if you'd like some more information

Can't get python to read my .txt file on OS X

I'm trying to get IDLE to read my .txt file but for some reason it won't. I tried this same thing at school and it worked fine using a Windows computer with Notepad, but now using my Mac with IDLE won't read (or find) my .txt file.
I made sure they were in the same folder/directory and that the file was formatted in plain text, still I get errors. Here's the code I was using:
def loadwords(filename):
f = open(filename, "r")
print(f.read())
f.close()
return
filename = input("enter the filename: ")
loadwords(filename)
and here is the error I got after I enter the file name "test.txt" and press enter:
Traceback (most recent call last):
File "/Computer Sci/My programs/HW4.py", line 8, in <module>
loadwords(filename)
File "/Computer Sci/My programs/HW4.py", line 4, in loadwords
print(f.read())
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
The error you see means your Python interpreter tries to load the file as ASCII chars, but the text file you're trying to read is not ASCII encoded. It's probably UTF-8 encoded (the default in recent OSX systems).
Adding the encoding to the open command should work better:
f = open(filename, "r" "utf8")
Another way to correct that, would be to go back to TextEdit with your file and then select Duplicate (or Save as shift-cmd-S) where you'll be able to save your file again, but this time choosing the ASCII encoding. Though you might need to add ASCII in the encodings option list if it is not present.
This other question and accepted answer provides some more thoughts about the way to choose the encoding of the file you're reading.
You need to open the file with the appropriate encoding. Also, you should return something from the method otherwise you won't be able to do anything with the file.
Try this version:
def loadwords(filename):
with open(filename, 'r', encoding='utf8') as f:
lines = [line for line in f if line.strip()]
return lines
filename = input('Enter the filename: ')
file_lines = loadwords(filename)
for eachline in file_lines:
print('The line is {}'.format(eachline))
This line [line for line in f if line.strip()] is a list comprehension. It is the short version of:
for line in f:
if line.strip(): # check for blank lines
lines.append(line)
textfile = "textfile.txt"
file = open(textfile, "r", encoding = "utf8")
read = file.read()
file.close()
print(read)
This encoding limitiation was limited to python version 2.*
If your MAC is running a Python version 3.* you do not have to add the extra encoding part to encode the txt file.
The below function will directly run in python 3 without any edit.
def loadwords(filename):
f = open(filename, "r")
print(f.read())
f.close()
return
filename = input("enter the filename: ")
loadwords(filename)

GAE Full Text Search development console UnicodeEncodeError

I have an index with manny words with accent (e.g: São Paulo, José, etc).
The search api works fine, but when try to do some test queries on development console, I can't access index data.
This error only occurs on development environment. On production GAE everything works fine.
Bellow the traceback:
Traceback (most recent call last):
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/_webapp25.py", line 701, in __call__
handler.get(*groups)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/admin/__init__.py", line 1704, in get
'values': self._ProcessSearchResponse(resp),
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/admin/__init__.py", line 1664, in _ProcessSearchResponse
value = TruncateValue(doc.fields[field_name].value)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/admin/__init__.py", line 158, in TruncateValue
value = str(value)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc1' in position 5: ordinal not in range(128)

Resources