After the AppEngine's Files API Service Turndown, now we cannot directly create and write blob. Now how to copy/transfer Blobs from one app to another?
To "write" blobs, you can use blob upload to post blobs to your receiving app. To do this you have to post a multipart encoded file using an upload_url of the receiving app.
def post_blob(url, blob_name, blob):
CRLF = '\r\n'
BOUNDERY = '--Th15-Is-ThE-BoUnDeRy--'
payload = '--' + BOUNDERY + CRLF
payload += 'Content-Disposition: form-data; name="file"; filename="%s"' \
% blob_name + CRLF
payload += 'Content-Type: %s' % mimetypes.guess_type(blob_name)[0] + CRLF
payload += CRLF
payload += blob + CRLF
payload += '--%s--' % BOUNDERY + CRLF
urlfetch.fetch(
url=url,
payload=payload,
method=urlfetch.POST,
deadline=40,
follow_redirects=False,
headers={'Content-Type': 'multipart/form-data; boundary=%s' % BOUNDERY)}
)
url = get_receiving_app_blobstore_upload_url() # request upload url
blob_info = blobstore.BlobInfo.get(blob_key)
blob_reader = blobstore.BlobReader(blob_key)
post_blob(url, blob_info.filename, blob_reader.read())
In the receiving app you have to create two handlers:
to request the upload url using blobstore.create_upload_url()
to handle the above post
Related
I am trying to solve a classic ETL problem using streaming. I have a batch of segments, Each segment holds information about the records associated for that segment like number of records, url to retrieve etc, to issue a http request to collect data. I need to extract the records from a source with paging size of 100 records, merge the pages of records for each segment, wrap in a xml header and footer. Now send each xml payload per segment to a target.
{http}
page 1
/ \
seg 1 > page 2 -> merge -> wrapHeaderAndFooter -> http target
/ \ /
/ page n
/
/
batch - seg 2 " -> http target
\ seg n " -> http target
val loadSegment: Flow[Segment, Response, NotUsed] = {
Flow[Segment].mapAsync(parallelism = 5) { segment =>
val pages: Source[ByteString, NotUsed] = pagedPayload(segment).map(page => page.payload)
//Using source concatenation to prepend and append
val wrappedInXML: Source[ByteString, NotUsed] = xmlRootStartTag ++ pages ++ xmlRootEndTag
val httpEntity: HttpEntity = HttpEntity(MediaTypes.`application/octet-stream`, pages)
invokeTargetLoad(httpEntity, request, segment)
}
}
def pagedPayload(segment: Segment): Source[Payload, NotUsed] = {
val totalPages: Int = calculateTotalPages(segment.instanceCount)
Source(0 until totalPages).mapAsyncUnordered(parallelism = 5)(i => {
sendPayloadRequest(request, segment, i).mapTo[Try[Payload]].map(_.get)
})
}
val batch: Batch = someBatch
Source(batch.segments)
.via(loadSegment)
.runWith(Sink.ignore)
.andThen {
case Success(value) => log("success")
case Failure(error) => report(error)
}
Is there a better approach? I am trying to use the HttpEntity.Chunked encoding to stream the pages. Sometimes the first request from the source can take longer time due to warm up and the target truncates the stream with no data. Is there a way to delay the actual connection to target until we have the first page in stream?
I would have more liked to do something like below. if it's possible how to implement methods wrapXMLHeader & toHttpEntity
val splitPages: Flow[BuildSequenceSegment, Seq[PageRequest], NotUsed] = ???
val requestPayload: Flow[Seq[PageRequest], Seq[PageResponse], NotUsed] = ???
val wrapXMLHeader: Flow[Seq[PageResponse], Seq[PageResponse], NotUsed] = ???
val toHttpEntity: Flow[Seq[PageResponse], HttpEntity.Chunked, NotUsed] = ???
val invokeTargetLoad: Flow[HttpEntity.Chunked, RestResponse, NotUsed] = ???
Source(batch.segments)
.via(splitPages)
.via(requestPayload)
.via(wrapXMLHeader)
.via(toHttpEntity)
.via(invokeTargetLoad)
.runWith(Sink.ignore)
I have a list of around 2500 mail ids and I'm stuck to only use requests library, so so far i do it this way to get mail headers
mail_ids = ['']
for mail_id in mails_ids:
res = requests.get(
'https://www.googleapis.com/gmail/v1/users/me/messages/{}?
format=metadata'.format(mail_id), headers=headers).json()
mail_headers = res['payload']['headers']
...
But its very inefficient and i would rather like to POST list of Ids instead, but on their documentation https://developers.google.com/gmail/api/v1/reference/users/messages/get, i don't see BatchGet, any workaround? I'm using Flask framework Thanks a lot
This is a bit late, but in case it helps anyone, here's the code I used to do a batch get of emails:
First I get a list of relevant emails. Change the request according to your needs, I'm getting only sent emails for a certain time period:
query = "https://www.googleapis.com/gmail/v1/users/me/messages?labelIds=SENT&q=after:2020-07-25 before:2020-07-31"
response = requests.get(query, headers=header)
events = json.loads(response.content)
email_tokens = events['messages']
while 'nextPageToken' in events:
response = requests.get(query+f"&pageToken={events['nextPageToken']}",
headers=header)
events = json.loads(response.content)
email_tokens += events['messages']
Then I'm batching a get request to get 100 emails at a time, and parsing only the json part of the email and putting it into a list called emails. Note that there's some repeated code here, so you may want to refactor it into a method. You'll have to set your access token here:
emails = []
access_token = '1234'
header = {'Authorization': 'Bearer ' + access_token}
batch_header = header.copy()
batch_header['Content-Type'] = 'multipart/mixed; boundary="email_id"'
data = ''
ctr = 0
for token_dict in email_tokens:
data += f'--email_id\nContent-Type: application/http\n\nGET /gmail/v1/users/me/messages/{token_dict["id"]}?format=full\n\n'
if ctr == 99:
data += '--email_id--'
print(data)
r = requests.post(f"https://www.googleapis.com/batch/gmail/v1",
headers=batch_header, data=data)
bodies = r.content.decode().split('\r\n')
for body in bodies:
if body.startswith('{'):
parsed_body = json.loads(body)
emails.append(parsed_body)
ctr = 0
data = ''
continue
ctr+=1
data += '--email_id--'
r = requests.post(f"https://www.googleapis.com/batch/gmail/v1",
headers=batch_header, data=data)
bodies = r.content.decode().split('\r\n')
for body in bodies:
if body.startswith('{'):
parsed_body = json.loads(body)
emails.append(parsed_body)
[Optional] Finally, I'm decoding the text in the email and storing only the last sent email instead of the whole thread. The regex used here splits on strings that I found were usually at the end of emails. For instance, On Tue, Jun 23, 2020, x#gmail.com said...:
import re
import base64
gmail_split_regex = r'On [a-zA-z]{3}, ([a-zA-z]{3}|\d{2}) ([a-zA-z]{3}|\d{2}),? \d{4}'
for email in emails:
if 'parts' not in email['payload']:
continue
for part in email['payload']['parts']:
if part['mimeType'] == 'text/plain':
if 'uniqueBody' not in email:
plainText = str(base64.urlsafe_b64decode(bytes(str(part['body']['data']), encoding='utf-8')))
email['uniqueBody'] = {'content': re.split(gmail_split_regex, plainText)[0]}
elif 'parts' in part:
for sub_part in part['parts']:
if sub_part['mimeType'] == 'text/plain':
if 'uniqueBody' not in email:
plainText = str(base64.urlsafe_b64decode(bytes(str(sub_part['body']['data']), encoding='utf-8')))
email['uniqueBody'] = {'content': re.split(gmail_split_regex, plainText)[0]}
How can i send an inline images using Google API, here is my sameple code
Kindy let me know what am i missing ?
String email =
"Content-Type: multipart/related; boundary:\"multipart_related_boundary\"\r\n" +
"MIME-Version: 1.0\r\n" +
FROM_ME +
TO + toAddress + "\r\n" +
SUBJECT + "welcome" + "\r\n"
+ "--multipart_related_boundary" + "\r\n" +
"Content-type: image/gif; name=\"083.gif\"\r\n" +
"MIME-Version: 1.0\r\n" +
"Content-ID: <083.gif>\r\n" +
"Content-Disposition: inline\r\n"
+ "--multipart_related_boundary" + "\r\n" +
"MIME-Version: 1.0\r\n" +
"Content-Type: text/html; charset=utf-8\r\n" +
CONTENT_TRANSFER_ENCODING_QUOTED_PRINTABLE +
"<html><body><img src=\"cid:083.gif\"/> welcome " +
"</body></html>\r\n\r\n";
byte[] converted = Base64.encodeBase64(email.getBytes());
String encodedStr = new String(converted);
encodedStr = encodedStr.replace("/", "_").replace("+", "-");
MediaType mediaType = MediaType.parse(APPLICATION_JSON);
RequestBody body = RequestBody.create(mediaType, RAW + encodedStr + END_BRACKET);
Request request = new Request.Builder()
.url(HTTPS_WWW_GOOGLEAPIS_COM_GMAIL_V1_USERS_ME_MESSAGES_SEND).post(body)
.addHeader(AUTHORIZATION,
BEARER + gmailAuthService.getRefreshToken(token).getAccessToken())
.addHeader(CONTENT_TYPE, "multipart/related; boundary:\"multipart_related_boundary\"").build();
Response response = okHttpClient.newCall(request).execute();
Finally in my gmail, i am not able to seethe inline image.
You are including the image section of your MIME message, but not the image itself.
After the Content-ID: <083.gif> and Content-Disposition: inline headers you need to include the actual image. Specifically, you probably want to add a Content-Transfer-Encoding: base64 header to that section and include a base64 encoded image payload.
An easy way to see how it could/should work is to use Gmail to email yourself a short test email with a small image. Then, in Gmail (Web UI) go to the message options (near the Reply button) and select "Show Original". That will show you exactly how the MIME message is built.
I am currently working on a google cloud project in free trial mode. I have cron job to fetch the data from a data vendor and store it in the data store. I wrote the code to fetch the data couple of weeks ago and it was all working fine but all of sudden , i started receiving error "DeadlineExceededError: The overall deadline for responding to the HTTP request was exceeded" for last two days. I believe cron job is supposed to timeout only after 60 minutes any idea why i am getting the error?.
cron task
def run():
try:
config = cron.config
actual_data_source = config['xxx']['xxxx']
original_data_source = actual_data_source
company_list = cron.rest_client.load(config, "companies", '')
if not company_list:
logging.info("Company list is empty")
return "Ok"
for row in company_list:
company_repository.save(row,original_data_source, actual_data_source)
return "OK"
Repository code
def save( dto, org_ds , act_dp):
try:
key = 'FIN/%s' % (dto['ticker'])
company = CompanyInfo(id=key)
company.stock_code = key
company.ticker = dto['ticker']
company.name = dto['name']
company.original_data_source = org_ds
company.actual_data_provider = act_dp
company.put()
return company
except Exception:
logging.exception("company_repository: error occurred saving the company
record ")
raise
RestClient
def load(config, resource, filter):
try:
username = config['xxxx']['xxxx']
password = config['xxxx']['xxxx']
headers = {"Authorization": "Basic %s" % base64.b64encode(username + ":"
+ password)}
if filter:
from_date = filter['from']
to_date = filter['to']
ticker = filter['ticker']
start_date = datetime.strptime(from_date, '%Y%m%d').strftime("%Y-%m-%d")
end_date = datetime.strptime(to_date, '%Y%m%d').strftime("%Y-%m-%d")
current_page = 1
data = []
while True:
if (filter):
url = config['xxxx']["endpoints"][resource] % (ticker, current_page, start_date, end_date)
else:
url = config['xxxx']["endpoints"][resource] % (current_page)
response = urlfetch.fetch(
url=url,
deadline=60,
method=urlfetch.GET,
headers=headers,
follow_redirects=False,
)
if response.status_code != 200:
logging.error("xxxx GET received status code %d!" % (response.status_code))
logging.error("error happend for url: %s with headers %s", url, headers)
return 'Sorry, xxxx API request failed', 500
db = json.loads(response.content)
if not db['data']:
break
data.extend(db['data'])
if db['total_pages'] == current_page:
break
current_page += 1
return data
except Exception:
logging.exception("Error occured with xxxx API request")
raise
I'm guessing this is the same question as this, but now with more code:
DeadlineExceededError: The overall deadline for responding to the HTTP request was exceeded
I modified your code to write to the database after each urlfetch. If there are more pages, then it relaunches itself in a deferred task, which should be well before the 10 minute timeout.
Uncaught exceptions in a deferred task cause it to retry, so be mindful of that.
It was unclear to me how actual_data_source & original_data_source worked, but I think you should be able to modify that part.
crontask
def run(current_page=0):
try:
config = cron.config
actual_data_source = config['xxx']['xxxx']
original_data_source = actual_data_source
data, more = cron.rest_client.load(config, "companies", '', current_page)
for row in data:
company_repository.save(row, original_data_source, actual_data_source)
# fetch the rest
if more:
deferred.defer(run, current_page + 1)
except Exception as e:
logging.exception("run() experienced an error: %s" % e)
RestClient
def load(config, resource, filter, current_page):
try:
username = config['xxxx']['xxxx']
password = config['xxxx']['xxxx']
headers = {"Authorization": "Basic %s" % base64.b64encode(username + ":"
+ password)}
if filter:
from_date = filter['from']
to_date = filter['to']
ticker = filter['ticker']
start_date = datetime.strptime(from_date, '%Y%m%d').strftime("%Y-%m-%d")
end_date = datetime.strptime(to_date, '%Y%m%d').strftime("%Y-%m-%d")
url = config['xxxx']["endpoints"][resource] % (ticker, current_page, start_date, end_date)
else:
url = config['xxxx']["endpoints"][resource] % (current_page)
response = urlfetch.fetch(
url=url,
deadline=60,
method=urlfetch.GET,
headers=headers,
follow_redirects=False,
)
if response.status_code != 200:
logging.error("xxxx GET received status code %d!" % (response.status_code))
logging.error("error happend for url: %s with headers %s", url, headers)
return [], False
db = json.loads(response.content)
return db['data'], (db['total_pages'] != current_page)
except Exception as e:
logging.exception("Error occured with xxxx API request: %s" % e)
return [], False
I would prefer to write this as a comment, but I need more reputation to do that.
What happens when you run the actual data fetch directly instead of
through the cron job?
Have you tried measuring a time delta from the start to the end of
the job?
Has the number of companies being retrieved increased dramatically?
You appear to be doing some form of stock quote aggregation - is it
possible that the provider has started blocking you?
Delphi used: 2007.
Hello,
I have a simple web page with two text input and one file input. Now, for the form to be sent, both the text inputs and the file input have to be filled. With Synapse, I know how to upload a file (HttpPostFile) and how to post data (HttpMethod). However, I don't know how to do both.
After looking at the source code of Synapse, I guess I have to "format" my data with boundaries or something like that. I guess I should have one boundary for my input file and another boundary for my text inputs. I found an article on the subject, but it's about sending email attachments. I tried to reproduce what they said with Synapse, with no results.
Code for HttpPostFile:
function HttpPostFile(const URL, FieldName, FileName: string;
const Data: TStream; const ResultData: TStrings): Boolean;
var
HTTP: THTTPSend;
Bound, s: string;
begin
Bound := IntToHex(Random(MaxInt), 8) + '_Synapse_boundary';
HTTP := THTTPSend.Create;
try
s := '--' + Bound + CRLF;
s := s + 'content-disposition: form-data; name="' + FieldName + '";';
s := s + ' filename="' + FileName +'"' + CRLF;
s := s + 'Content-Type: Application/octet-string' + CRLF + CRLF;
WriteStrToStream(HTTP.Document, s);
HTTP.Document.CopyFrom(Data, 0);
s := CRLF + '--' + Bound + '--' + CRLF;
WriteStrToStream(HTTP.Document, s);
HTTP.MimeType := 'multipart/form-data; boundary=' + Bound;
Result := HTTP.HTTPMethod('POST', URL);
if Result then
ResultData.LoadFromStream(HTTP.Document);
finally
HTTP.Free;
end;
end;
Thank you.
Your code is close. You are only sending your file field but not your text fields. To do all three, try this instead:
function HttpPostFile(const URL, InputText1FieldName, InputText1, InputText2FieldName, InputText2, InputFileFieldName, InputFileName: string; InputFileData: TStream; ResultData: TStrings): Boolean;
var
HTTP: THTTPSend;
Bound: string;
begin
Bound := IntToHex(Random(MaxInt), 8) + '_Synapse_boundary';
HTTP := THTTPSend.Create;
try
WriteStrToStream(HTTP.Document,
'--' + Bound + CRLF +
'Content-Disposition: form-data; name=' + AnsiQuotedStr(InputText1FieldName, '"') + CRLF +
'Content-Type: text/plain' + CRLF +
CRLF);
WriteStrToStream(HTTP.Document, InputText1);
WriteStrToStream(HTTP.Document,
CRLF +
'--' + Bound + CRLF +
'Content-Disposition: form-data; name=' + AnsiQuotedStr(InputText2FieldName, '"') + CRLF +
'Content-Type: text/plain' + CRLF +
CRLF);
WriteStrToStream(HTTP.Document, InputText2);
WriteStrToStream(HTTP.Document,
CRLF +
'--' + Bound + CRLF +
'Content-Disposition: form-data; name=' + AnsiQuotedStr(InputFileFieldName, '"') + ';' + CRLF +
#9'filename=' + AnsiQuotedStr(InputFileName, '"') + CRLF +
'Content-Type: application/octet-string' + CRLF +
CRLF);
HTTP.Document.CopyFrom(InputFileData, 0);
WriteStrToStream(HTTP.Document,
CRLF +
'--' + Bound + '--' + CRLF);
HTTP.MimeType := 'multipart/form-data; boundary=' + Bound;
Result := HTTP.HTTPMethod('POST', URL);
if Result then
ResultData.LoadFromStream(HTTP.Document);
finally
HTTP.Free;
end;
end;
If you switch to Indy, you can use its TIdMultipartFormDataStream class:
function HttpPostFile(const URL, InputText1FieldName, InputText1, InputText2FieldName, InputText2, InputFileFieldName, InputFileName: string; InputFileData: TStream; ResultData: TStrings): Boolean;
var
HTTP: TIdHTTP;
Input: TIdMultipartFormDataStream;
Output: TMemoryStream;
begin
Result := False;
try
Output := TMemoryStream.Create;
try
HTTP := TIdHTTP.Create;
try
Input := TIdMultipartFormDataStream.Create;
try
Input.AddFormField(InputText1FieldName, InputText1);
Input.AddFormField(InputText2FieldName, InputText2);
Input.AddFormField(InputFileFieldName, 'application/octet-stream', '', InputFileData, InputFileName);
HTTP.Post(URL, Input, Output);
finally
Input.Free;
end;
finally
HTTP.Free;
end;
Output.Position := 0;
ResultData.LoadFromStream(Output);
Result := True;
finally
Output.Free;
end;
except
end;
end;
I also use synapse in my projects. To be make my work simple and faster with Synapse, I wrote THTTPSendEx class, that gives fast speed of using and minimum of code and more features.
Currently it's a beta version.
It's views like Indy.
Create THTTPSendEx class.
Create methods OnBeginWork, OnWork, OnWorkEnd from it prototypes(see pas file), and assign it to created class. Thats all what you need, and just call GET, POST functions of the class.
I also implement multipart-fomdata for fast file posting in this format as TMultipartFormDataStream class.
With it you can easy write files and fields.
Example of using:
var
HTTP:THTTPSendEx;
Data:TMultipartFormDataStream;
sHTML:string; //Recived HTML code from web
begin
HTTP:=THTTPSEndEx.Create;
Data:=TMultipartFormDataStream.Create;
try
Data.AddFile('myFile','Path to the local file(No UNC paths)');
Data.DataEnd;
if HTTP.Post('URL HERE',Data,sHTML) then
begin
//Connection established
//Check HTTP response
if HTTP.IsSuccessfull then //HTTP returns "200 OK" code.
begin
ShowMessage('File successfully posted to the server.');
end;
end else
begin
ShowMessage('Can not establish a connection to the server...'+#13+'Network is not avaliable or server socket does not exist.');
end;
finally
FreeAndNil(HTTP);
FreeAndNil(Data);
end;
end;
You can see it at my web-site.
If you have any ideas for this, please write it's as a comment to project page.
Sorry for mistakes in english.