Let's say the app received a message, which has attachments (mail_message.attachments). Now, I would like to save the message in the datastore. I don't want to store attachment there, so I would like to keep there blobstore keys only. I know that I can write files to blobstore. The questions I have:
how to extract files from the mail attachment;
how to keep original filenames;
how to store blob keys in the datastore (taking into account that one mail can contain several attachments looks like BlobKeyProperty() doesn't work in this case).
Upd. For (1) the following code can be used:
my_file = []
my_list = []
if hasattr(mail_message, 'attachments'):
file_name = ""
file_blob = ""
for filename, filecontents in mail_message.attachments:
file_name = filename
file_blob = filecontents.decode()
my_file.append(file_name)
my_list.append(str(store_file(self, file_name, file_blob)))
You should use NDB instead of the (old) datastore. In NDB you can use repeated and structured repeated properties to save a list of BlobProperties and filenames.
See : https://developers.google.com/appengine/docs/python/ndb/properties
Here is what I finally do:
class EmailHandler(webapp2.RequestHandler):
def post(self):
'''
Receive incoming e-mails
Parse message manually
'''
msg = email.message_from_string(self.request.body) # http://docs.python.org/2/library/email.parser.html
for part in msg.walk():
ctype = part.get_content_type()
if ctype in ['image/jpeg', 'image/png']:
image_file = part.get_payload(decode=True)
image_file_name = part.get_filename()
# save file to blobstore
bs_file = files.blobstore.create(mime_type=ctype, _blobinfo_uploaded_filename=image_file_name)
with files.open(bs_file, 'a') as f:
f.write(image_file)
files.finalize(bs_file)
blob_key = files.blobstore.get_blob_key(bs_file)
blob_keys are stored in the datastore as ndb.BlobKeyProperty(repeated=True).
Related
Classic Snowflake Web UI and the new Snowsight are great at importing sql from a file but neither allows you to export sql to a file. Is there a workaround?
You can use an IDE to connect to snowflake and write queries. Then the scripts can be downloaded using IDE features and can sync with git repo as well.
dbeaver is one such IDE which supports snowflake :
https://hevodata.com/learn/dbeaver-snowflake/
The query pane is interactive so the obvious workaround will be:
CTRL + A (select all)
CTRL + C (copy)
<open_favourite_text_editor>
CTRL + P (paste)
CTRL + S (save)
This tool can help you while the team develops a native feature to export worksheets:
"Snowflake Snowsight Extensions wrap Snowsight features that do not have API or SQL alternatives, such as manipulating Dashboards and Worksheets, and retrieving Query Profile and step timings."
https://github.com/Snowflake-Labs/sfsnowsightextensions
Further explained on this post:
https://medium.com/snowflake/importing-and-exporting-snowsight-dashboards-and-worksheets-3cd8e34d29c8
For example, to save to a file within PowerShell:
PS > $dashboards | foreach {$_.SaveToFolder(“path/to/folder”)}
PS > $dashboards[0].SaveToFile(“path/to/folder/mydashboard.json”)
ETA: I'm adding this edit to the front because this is what actually worked.
Again, BSON was a dead end & punycode is irrelevant. I don't know why punycode is referenced in the metadata file; but my best guess is that they might use punycode to encode the worksheet name itself (though I'm not sure why that would be needed since it shouldn't need to be part of a URL).
After doing terrible things and trying a number of complex ways of dealing with escape character hell, I found that the actual encoding is very simple. It just works as an 8 bit encoding with anything that might cause problems escaped away (null, control codes, double quotes, etc.). To load, treat the file as a text file using an 8-bit encoding; extract the data as a JSON field, then re-encode that extracted data as that same encoding. I just used latin_1 to read; but it may not even matter which encoding you use as long as you are consistent and use the same one to re-encode. The encoded field will then be valid zlib compressed data.
I decided that I wanted to start from scratch so I needed to back the worksheets first and I made a Python script based on my findings above. Be warned that this may return even worksheets that you previously closed for good. After running this and verifying that backups were created, I just ran rm #~/worksheet_data/;, closed the tab & reopened it.
Here's the code (fill in the appropriate base directory location):
import os
from collections import OrderedDict
import configparser
from sqlalchemy import create_engine, exc
from snowflake.sqlalchemy import URL
import pathlib
import json
import zlib
import string
def format_filename(s: str) -> str: # From https://gist.github.com/seanh/93666
"""Take a string and return a valid filename constructed from the string.
Uses a whitelist approach: any characters not present in valid_chars are
removed. Also spaces are replaced with underscores.
Note: this method may produce invalid filenames such as ``, `.` or `..`
When I use this method I prepend a date string like '2009_01_15_19_46_32_'
and append a file extension like '.txt', so I avoid the potential of using
an invalid filename.
"""
valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)
filename = ''.join(c for c in s if c in valid_chars)
# filename = filename.replace(' ','_') # I don't like spaces in filenames.
return filename
def trlng_dash(s: str) -> str:
"""Removes trailing character if present."""
return s[:-1] if s[-1] == '-' else s
sso_authenticate = True
# Assumes CLI config file exists.
config = configparser.ConfigParser()
home = pathlib.Path.home()
config_loc = home/'.snowsql/config' # Assumes it's set up from Snowflake CLI.
base_dir = home/r'{your Desired base directory goes here.}'
json_dir = base_dir/'json' # Location for your worksheet stage JSON files.
sql_dir = base_dir/'sql' # Location for your worksheets.
# Assumes CLI config file exists.
config.read(config_loc)
# Add connection parameters here (assumes CLI config exists).
# Using sso so only 2 are needed.
# If there's no config file, etc. enter by hand here (or however you want to do it).
connection_params = {
'account': config['connections']['accountname'],
'user': config['connections']['username'],
}
if sso_authenticate:
connection_params['authenticator'] = 'externalbrowser'
if config['connections'].get('password', None) is not None:
connection_params['password'] = config['connections']['password']
if config['connections'].get('rolename', None) is not None:
connection_params['role'] = config['connections']['rolename']
if locals().get('database', None) is not None:
connection_params['database'] = database
if locals().get('schema', None) is not None:
connection_params['schema'] = schema
sf_engine = create_engine(URL(**connection_params))
if not base_dir.exists():
base_dir.mkdir()
if not json_dir.exists():
json_dir.mkdir()
if not (sql_dir).exists():
sql_dir.mkdir()
with sf_engine.connect() as connection:
connection.execute(f'get #~/worksheet_data/ \'file://{str(json_dir.as_posix())}\';')
for file in [path for path in json_dir.glob('*') if path.is_file()]:
if file.suffix != '.json':
file.replace(file.with_suffix(file.suffix + '.json'))
with open(json_dir/'metadata.json', 'r') as metadata_file:
files_meta = json.load(metadata_file)
# List of files from metadata file will contain some empty worksheets.
files_description_orig = OrderedDict((file_key_value['name'], file_key_value) for file_key_value in sorted(files_meta['activeWorksheets'] + list(files_meta['inactiveWorksheets'].values()), key=lambda x: x['name']) if file_key_value['name'])
# files_description will only track non empty worksheets
files_description = files_description_orig.copy()
# Create updated files description filtering out empty worksheets.
for item in files_description_orig:
json_file = json_dir/f"{files_description_orig[item]['name']}.json"
# If a file didn't make it or was deleted by hand, we should
# remove from the filtered description & continue to the next item.
if not (json_file.exists() and json_file.is_file()):
del files_description[item]
continue
with open(json_file, 'r', encoding='latin_1') as f:
json_dat = json.load(f)
# If the file represents a worksheet with a body field, we want it.
if not json_dat['wsContents'].get('body'):
del files_description[item]
## Delete JSON files corresponsing to empty worksheets.
# f.close()
# try:
# (json_dir/f"{files_description_orig[item]['name']}.json").unlink()
# except:
# pass
# Produce a list of normalized filenames (no illegal or awkward characters).
file_names = set(
format_filename(trlng_dash(files_description[item]['encodedDetails']['scriptName']).strip())
for item in files_description)
# Add useful information to our files_description OrderedDict
for file_name in file_names:
repeats_cnt = 0
file_name_repeats = (
item
for item
in files_description
if file_name == format_filename(trlng_dash(files_description[item]['encodedDetails']['scriptName']).strip())
)
for file_uuid in file_name_repeats:
files_description[file_uuid]['normalizedName'] = file_name
files_description[file_uuid]['stemSuffix'] = '' if repeats_cnt == 0 else f'({repeats_cnt:0>2})'
repeats_cnt += 1
# Now we iterate on non-empty worksheets only.
for item in files_description:
json_file = json_dir/f"{files_description[item]['name']}.json"
with open(json_file, 'r', encoding='latin_1') as f:
json_dat = json.load(f)
body = json_dat['wsContents']['body']
body_bin = body.encode('latin_1')
body_txt = zlib.decompress(body_bin).decode('utf8')
sql_file = sql_dir/f"{files_description[item]['normalizedName']}{files_description[item]['stemSuffix']}.sql"
with open(sql_file, 'w') as sql_f:
sql_f.write(body_txt)
creation_stamp = files_description[item]['created']/1000
os.utime(sql_file, (creation_stamp,creation_stamp))
print('Done!')
As mentioned at Is there any option in snowflake to save or load worksheets? (and in Snowflake's own documentation), in the Classic UI, the worksheets are saved at the user stage under #~/worksheet_data/.
You can download it with a get command like:
get #~/worksheet_data/<name> file:///<your local location>; (though you might need quoting if running from Windows).
The problem is that I do not know how to access it programmatically. The downloaded files look like JSON but it is not valid JSON. The main key is "wsContents" and contains most of the worksheet information. Its value includes two subkeys, "encoding" and "body".
The "encoding" key denotes that gzip is being used. The "body" key seems to be the actual worksheet data which looks a lot like a straight binary representation of the compressed text data. As such, any JSON reader will choke on it.
If it is anything like that, I do not currently know how to access it programmatically using Python.
I do see that a JSON like format exists, BSON, that is bundled into PyMongo. Trying to use this on these files fails. I even tried bson.is_valid and it returns False so I am assuming that it means that these files in Snowflake are not actually BSON.
Edited to add: Again, BSON is a dead end.
Examining the "body" value as just binary data, the first two bytes of sample files do seem to correspond to default zlib compression (0x789c). However, attempting to run straight zlib.decompress on the slice created from that first byte to the last corresponding to the first & last characters of the "body" value results in the error:
Error - 3 while decompressing data: invalid code lengths set
This makes me think that the bytes there, as is, are at least partly garbage and still need some processing before they can be decompressed.
One clue that I failed to mention earlier is that the metadata file (called "metadata" and which serves as an inventory of the remaining files at the #~/worksheet_data/ location) declares that the files use the punycode encoding. However, I have not known how to use that information. The data in these files doesn't particularly look like what I feel punycode should look like nor does it particularly make sense to me that you would use punycode on binary data that is not meant to ever be used to directly generate text such as zlib compressed data.
I am HTTP posting a .zip File to my Django app via Django - Rest - Framework. This zip File is a Folder that contains several files, among them an image. I would like to extract the folder once its uploaded, select the image and assign it to the model. Is this possible? If not, maybe I can write a property that gets the thumbnail image? I want to be able to show all the thumbnails in a gallery later.
Something like:
class FUploadSerializer(serializers.ModelSerializer):
file = serializers.FileField()
class Meta:
model = FUpload
fields = ('created','file')
class FUpload(models.Model):
created = models.DateTimeField(auto_now_add=True)
file = models.FileField(upload_to=get_path)
thumbnail = ImageField() ??? # get from uploaded folder
EDIT
I tried the following, but am getting: "image": [
"The submitted data was not a file. Check the encoding type on the form."
] . Am I thinking about this completely backwards? Is it "better" to let the client handle sending the image and .zip folder separately (data is sent from a script)? Is the ModelViewSet the right place to handle this?
class Thumbnail(models.Model):
image = models.ImageField()
file = models.FileField()
class ThumbnailViewSet(viewsets.ModelViewSet):
queryset = Thumbnail.objects.all()
serializer_class = ThumbnailSerializer
parser_classes = (MultiPartParser, FormParser,)
def perform_create(self, serializer):
file = self.request.FILES['file']
zf = zipfile.ZipFile(file)
content_list = zf.namelist()
imgdata = zf.open(content_list[0])
serializer.save(image=imgdata)
class ThumbnailSerializer(serializers.ModelSerializer):
file = serializers.FileField()
image = serializers.ImageField(allow_empty_file=True, required=False)
class Meta:
model = Thumbnail
fields = ('file', 'image')
In order to make this work, you are going to need to implement your own custom parser that can handle the incoming zip file, extract it, and assign some of the files to specific fields.
class ZipParser(FileUploadParser):
"""
Parses an incoming ZIP file into separate fields.
The ZIP file is expected to be the only data passed in the request,
use the `MultiPartParser` as a base if it will be passed in using form
data instead.
"""
media_type = 'application/zip' # only needed for `FileUploadParser`
def parse(self, stream, media_type=None, parser_context=None):
import io
import zipfile
parsed = super(ZipParser, self).parse(
self,
media_type=media_type,
parser_context=parser_context
)
data = parsed.files
data['zip_file'] = data['file'] # Keep the original zip file as `zip_file`
parsed_file = data['file'] # Also store a reference to the original file so it can be used
# Try to parse the zip file
# This should also handle proper file handling (such as not passing a zip file)
with zipfile.ZipFile(parsed_file) as zip_file:
# Get the `bytes` object for a file using `read(file_name)`
image_data = zip_file.read('image.png') # Replace `image.png` with the image file
file_data = zip_file.read('info.txt')
# DRF expects that files are using an IO compatible object, so we need `BytesIO` here
image_stream = io.BytesIO(image_data)
file_stream = io.BytesIO(file_data)
# The changed files should go back onto the files dictionary
data['image'] = image_stream
data['file'] = file_stream
return parsed
You can alternatively do this outside of a parser before the data is passed into a serializer if you are using your own custom view.
I am doing development on python and GAE,
When I try to use ProtoRPC for web service, I cannot find a way to let my request contain a json format data in message. example like this:
request format:
{"owner_id":"some id","jsondata":[{"name":"peter","dob":"1911-1-1","aaa":"sth str","xxx":sth int}, {"name":...}, ...]}'
python:
class some_function_name(messages.Message):
owner_id = messages.StringField(1, required=True)
jsondata = messages.StringField(2, required=True) #is there a json field instead of StringField?
any other suggestion?
What you'd probably want to do here is use a MessageField. You can define your nested message above or within you class definition and use that as the first parameter to the field definition. For example:
class Person(Message):
name = StringField(1)
dob = StringField(2)
class ClassRoom(Message):
teacher = MessageField(Person, 1)
students = MessageField(Person, 2, repeated=True)
Alternatively:
class ClassRoom(Message):
class Person(Message):
...
...
That will work too.
Unfortunately, if you want to store arbitrary JSON, as in any kind of JSON data without knowing ahead of time, that will not work. All fields must be predefined ahead of time.
I hope that it's still helpful to you to use MessageField.
The following data is uploaded to my GAE application -
How can I
get fields with files only
get filenames of the uploaded files?
get fields with files only
import cgi
values = self.request.POST.itervalues()
files = [v for v in values if isinstance(v, cgi.FieldStorage)]
get filenames of the uploaded files
filenames = [f.filename for f in files]
Edit: corrected snippet, now tested :)
Assuming the data is POSTed using a form, for #2, see Get original filename google app engine
For #1, you could iterate through the self.request.POST multidict and see anything that looks like a file. self.request.POST looks like this:
UnicodeMultiDict([(u'file_1', FieldStorage(u'file_1', u'filename_1')), (u'random_string_field', u'random_string_value')])
Hope that helps you out
-Sam
filename = self.request.POST['file'].filename
file_ext = self.request.POST['file'].type
OR
filename = self.request.params[<form element name with file>].filename
Was wondering if I'm unconsciously using the Put method in my last line of code ( Please have a look). Thanks.
class User(db.Model):
name = db.StringProperty()
total_points = db.IntegerProperty()
points_activity_1 = db.IntegerProperty(default=100)
points_activity_2 = db.IntegerProperty(default=200)
def calculate_total_points(self):
self.total_points = self.points_activity_1 + self.points_activity_2
#initialize a user ( this is obviously a Put method )
User(key_name="key1",name="person1").put()
#get user by keyname
user = User.get_by_key_name("key1")
# QUESTION: is this also a Put method? It worked and updated my user entity's total points.
User.calculate_total_points(user)
While that method will certainly update the copy of the object that is in-memory, I do not see any reason to believe that the change will be persisted to the the datastore. Datastore write operations are costly, so they are not going to happen implicitly.
After running this code, use the datastore viewer to look at the copy of the object in the datastore. I think that you may find that it does not have the changed total_point value.