I have properly implemented InboundMailHandler and I'm able to process all other mail_message fields except mail_message.attachments. The attachment filename is read properly but the contents are not being saved in the proper mime_type
if not hasattr(mail_message, 'attachments'):
raise ProcessingFailedError('Email had no attached documents')
else:
logging.info("Email has %i attachment(s) " % len(mail_message.attachments))
for attach in mail_message.attachments:
filename = attach[0]
contents = attach[1]
# Create the file
file_name = files.blobstore.create(mime_type = "application/pdf")
# Open the file and write to it
with files.open(file_name, 'a') as f:
f.write(contents)
# Finalize the file. Do this before attempting to read it.
files.finalize(file_name)
# Get the file's blob key
blob_key = files.blobstore.get_blob_key(file_name)
return blob_key
blob_info = blobstore.BlobInfo.get(blob_key)
`
When I try to display the imported pdf file by going to the url: '/serve/%s' % blob_info.key()
I get a page with what seems like encoded data, instead of the actual pdf file.
Looks like this:
From nobody Thu Aug 4 23:45:06 2011 content-transfer-encoding: base64 JVBERi0xLjMKJcTl8uXrp/Og0MTGCjQgMCBvYmoKPDwgL0xlbmd0aCA1IDAgUiAvRmlsdGVyIC9G bGF0ZURlY29kZSA+PgpzdHJlYW0KeAGtXVuXHLdxfu9fgSef2RxxOX2by6NMbSLalOyQK+ucyHpQ eDE3IkWKF0vJj81vyVf3Qu9Mdy+Z40TswqKAalThqwJQjfm1/Hv5tWzxv13blf2xK++el+/LL+X+ g/dtefq
Any ideas? Thanks
Email's attachments are EncodedPayload objects; to get the data you should call the decode() method.
Try with:
# Open the file and write to it
with files.open(file_name, 'a') as f:
f.write(contents.decode())
If you want attachments larger 1MB to be processed successfully, decode and convert to str:
#decode and convert to string
datastr = str(contents.decode())
with files.open(file_name, 'a') as f:
f.write(datastr[0:65536])
datastr=datastr[65536:]
while len(datastr) > 0:
f.write(datastr[0:65536])
datastr=datastr[65536:]
Found the answer in this excellent blob post:
http://john-smith.appspot.com/app-engine--what-the-docs-dont-tell-you-about-processing-inbound-mail
This is how to decode an email attachment for GAE inbound mail:
for attach in mail_message.attachments:
filename, encoded_data = attach
data = encoded_data.payload
if encoded_data.encoding:
data = data.decode(encoded_data.encoding)
Related
I want to extract the tif file from a range of URL. This code works for one zip file, but if I want to extract zips in a range(1,43) it doesn't work
the error is:
BadZipFile: File is not a zip file
Could somebody help me?
print('Downloading started')
for number in range(1,3):
url = f'https://downloadagiv.blob.core.windows.net/dhm-vlaanderen-ii-dsm-raster-1m/DHMVIIDSMRAS1m_k{number}.zip'
req = requests.get(url)
# Split URL to get the file name
filename = url.split('/')[-1]
req = requests.get(url)
print('Downloading Completed')
zipfile= ZipFile(BytesIO(req.content))
listOfFileNames = zipfile.namelist()
for filename in listOfFileNames:
# Check filename endswith tif
if filename.endswith('.tif'):
# Extract a single file from zip
zipfile.extract(filename, '/content/gdrive/My Drive')
How can I save the output file from Run query and list results in a .PARQUET file format.
This is my current workflow.
My Logic App is working, But the file .parquet created are not valid every time I view it on Apache Parquet Viewer
Can someone help me on this matter. Thank you!
Output:
I see that you are trying to add .parquet to the csv file you are receiving but that's not how it will be converted to parquet file.
One of the workarounds that you can try is to get the csv file and then add Azure function which can convert into parquet file and then adding the azure function to logic app.
Here is the function that worked for me:
BlobServiceClient blobServiceClient = new BlobServiceClient("<YOUR CONNECTION STRING>");
BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient("<YOUR CONTAINER NAME>");
BlobClient blobClient = containerClient.GetBlobClient("sample.csv");
//Download the blob
Stream file = File.OpenWrite(#"C:\Users\<USER NAME>\source\repos\ParquetConsoleApp\ParquetConsoleApp\bin\Debug\netcoreapp3.1\" + blobClient.Name);
await blobClient.DownloadToAsync(file);
Console.WriteLine("Download completed!");
file.Close();
//Read the downloaded blob
Stream file1 = new FileStream(blobClient.Name, FileMode.Open);
Console.WriteLine(file1.ReadToEnd());
file1.Close();
//Convert to parquet
ChoParquetRecordConfiguration csv = new ChoParquetRecordConfiguration();
using (var r = new ChoCSVReader(#"C:\Users\<USER NAME>\source\repos\ParquetConsoleApp\ParquetConsoleApp\bin\Debug\netcoreapp3.1\" + blobClient.Name))
{
using (var w = new ChoParquetWriter(#"C:\Users\<USER NAME>\source\repos\ParquetConsoleApp\ParquetConsoleApp\bin\Debug\netcoreapp3.1\convertedParquet.parquet"))
{
w.Write(r);
w.Close();
}
}
after this step you can publish to your azure function and add the Azure function connector to your logic app
You can skip the first 2 steps (i.e.. Read and Download the blob) and get the blob directly from logic app and send it to your azure function and follow the same method as above. The generated parquet file will be in this path.
C:\Users\<USERNAME>\source\repos\ParquetConsoleApp\ParquetConsoleApp\bin\Debug\netcoreapp3.1\convertedParquet.parquet
Here convertedParquet.parquet is the name of the parquet file. Now you can read the converted parquet file in Apache Parquet reader.
Here is the output
The workflow of my function is the following:
retrieve a jpg through python get request
save image as png (even though is downloaded as jpg) on disk
use imageio to read from disk image and transform it into numpy array
work with the array
This is what I do to save:
response = requests.get(urlstring, params=params)
if response.status_code == 200:
with open('PATH%d.png' % imagenumber, 'wb') as output:
output.write(response.content)
This is what I do to load and transform png into np.array
imagearray = im.imread('PATH%d.png' % imagenumber)
Since I don't need to store permanently what I download I tried to modify my function in order to transform the response.content in a Numpy array directly. Unfortunately every imageio like library works in the same way reading a uri from the disk and converting it to a np.array.
I tried this but obviously it didn't work since it need a uri in input
response = requests.get(urlstring, params=params)
imagearray = im.imread(response.content))
Is there any way to overcome this issue? How can I transform my response.content in a np.array?
imageio.imread is able to read from urls:
import imageio
url = "https://example_url.com/image.jpg"
# image is going to be type <class 'imageio.core.util.Image'>
# that's just an extension of np.ndarray with a meta attribute
image = imageio.imread(url)
You can look for more information in the documentation, they also have examples: https://imageio.readthedocs.io/en/stable/examples.html
You can use BytesIO as file to skip writing to an actual file.
bites = BytesIO(base64.b64decode(response.content))
Now you have it as BytesIO, so you can use it just like a file:
img = Image.open(bites)
img_np = np.array(im)
I have some openpyxl code in my backend service (google app engine) and I'd like to load a file from google cloud store / blobstore, but passing the file stream (via blobstore reader) doesn't appear to be valid for load_workbook. xlrd has an option to pass file contents (Reading contents of excel file in python webapp2). Is there something similar for openpyxl?
blobstore_filename = '/gs{}'.format('/mybucket/mycloudstorefilename.xlsx')
blob_key = blobstore.create_gs_key(blobstore_filename)
blob_reader = blobstore.BlobReader(blob_key)
blob_reader = blobstore.BlobReader(blob_key, buffer_size=1048576)
blob_reader = blobstore.BlobReader(blob_key, position=0)
blob_reader_data = blob_reader.read()
load_workbook(blob_reader_data)
The error is:
UnicodeDecodeError
'ascii' codec can't decode byte 0x9d in position 11: ordinal not in range(128)
Found the missing link:
Using openpyxl to read file from memory
I needed to convert the file stream into bytes.
from io import BytesIO
...
wb = load_workbook(BytesIO(blob_reader_data))
So, I'm downloading some data files from an ftp server. I need to daily go in and retrieve new files and save them on my pc, but only the new ones.
Code so far:
from ftplib import FTP
import os
ftp = FTP('ftp.example.com')
ftp.login()
ftp.retrlines('LIST')
filenames = ftp.nlst()
for filename in filenames:
if filename not in ['..', '.']:
local_filename = os.path.join('C:\\Financial Data\\', filename)
file = open(local_filename, mode = 'x')
ftp.retrbinary('RETR '+ filename, file.write)
I was thinking of using if not os.path.exists() but I need the os.path.joint for this to work. Using open() with mode = 'x' as above, I get the following err message: "FileExistsError: [Errno 17] File exists"
Is error handling the way to go, or is there a neat trick that I'm missing?
I landed on the following solution:
filenames_ftp = ftp.nlst()
filenames_loc = os.listdir("C:\\Financial Data\\")
filenames = list(set(filenames_ftp) - set(filenames_loc))