Gatling to run scenario for each user/file - gatling

I am new to Gatling. So excuse me on my ignorance
I have the following code.
For users from 1 to 10, all the files in src/test/resources/data are posted using http BUT
I want to post one http request per file
Eg below there are 10 files in src/test/resources/data. So when I say
rampUsersPerSec(1) to (10) during (60) one file should be posted per user.
Any leads are appreciated. Thank you
val httpConf = http.baseUrl("http://localhost:8082/")
.header("Accept", "application/json")
val files = getListOfFiles("src/test/resources/data")
val scenarioOne = scenario("Post feature extractor")
.exec(
files.map( fileName => runFeatureExtractor(fileName))
)
setUp(
scenarioOne.inject(
nothingFor(5),
rampUsersPerSec(1) to (10) during (60)
)
).protocols(httpConf)
def runFeatureExtractor(fileName: String) = {
exec(
http("Extract features")
.post("feature")
.body(StringBody(fromFile(fileName).mkString)).asJson
.check(status.is(200))
)
}

Related

How do I create a timestamp in an embed with pycord?

I would like my sent msg log which is an embedded msg to have a timestamp, so it would have a footing like bot_name • Today at 10:48 PM
Here is my current code,
#bot.event
async def on_message(ctx):
if ctx.author.bot: return
else:
log_msg = bot.get_channel(1023451687857442828)
embed = discord.Embed(
title = "Sent Message",
description = f"""This message was sent by{ctx.author.mention}, in {ctx.channel}\n**Message:** "{ctx.content}" """,
color = discord.Colour.green(),
)
embed.set_author(name=ctx.author)
embed.set_footer(text="bot_name")
await log_msg.send(embed=embed)```
You could do the following:
(remaining of your code)
now = datetime.now() # Gets your current time as a datetime object
embed.set_footer(text=f'bot_name • Today at {now.strftime("%H:%M")}') # Set your embed footer with the formatted time
await log_msg.send(embed=embed) # Sends your footer (already present in your original code)
If you don't have it imported already, you'll have to import datetime from datetime, or import datetime and do datetime.datetime.now() instead of just datetime.now()
You can use ctx.message.created_at to add a timestamp. That would look like this in the context of your code:
#bot.event
async def on_message(ctx):
if ctx.author.bot: return
else:
log_msg = bot.get_channel(1023451687857442828)
embed = discord.Embed(
title = "Sent Message",
description = f"""This message was sent by{ctx.author.mention}, in {ctx.channel}\n**Message:** "{ctx.content}" """,
color = discord.Colour.green(),
# message created at code below
timestamp = ctx.message.created_at
)
embed.set_author(name=ctx.author)
embed.set_footer(text="bot_name")
await log_msg.send(embed=embed)
Hope this helps

Stream of items, each producing n{ http request } > merge{response} > wrapInHeaderAndFooter{data} > http-request

I am trying to solve a classic ETL problem using streaming. I have a batch of segments, Each segment holds information about the records associated for that segment like number of records, url to retrieve etc, to issue a http request to collect data. I need to extract the records from a source with paging size of 100 records, merge the pages of records for each segment, wrap in a xml header and footer. Now send each xml payload per segment to a target.
{http}
page 1
/ \
seg 1 > page 2 -> merge -> wrapHeaderAndFooter -> http target
/ \ /
/ page n
/
/
batch - seg 2 " -> http target
\ seg n " -> http target
val loadSegment: Flow[Segment, Response, NotUsed] = {
Flow[Segment].mapAsync(parallelism = 5) { segment =>
val pages: Source[ByteString, NotUsed] = pagedPayload(segment).map(page => page.payload)
//Using source concatenation to prepend and append
val wrappedInXML: Source[ByteString, NotUsed] = xmlRootStartTag ++ pages ++ xmlRootEndTag
val httpEntity: HttpEntity = HttpEntity(MediaTypes.`application/octet-stream`, pages)
invokeTargetLoad(httpEntity, request, segment)
}
}
def pagedPayload(segment: Segment): Source[Payload, NotUsed] = {
val totalPages: Int = calculateTotalPages(segment.instanceCount)
Source(0 until totalPages).mapAsyncUnordered(parallelism = 5)(i => {
sendPayloadRequest(request, segment, i).mapTo[Try[Payload]].map(_.get)
})
}
val batch: Batch = someBatch
Source(batch.segments)
.via(loadSegment)
.runWith(Sink.ignore)
.andThen {
case Success(value) => log("success")
case Failure(error) => report(error)
}
Is there a better approach? I am trying to use the HttpEntity.Chunked encoding to stream the pages. Sometimes the first request from the source can take longer time due to warm up and the target truncates the stream with no data. Is there a way to delay the actual connection to target until we have the first page in stream?
I would have more liked to do something like below. if it's possible how to implement methods wrapXMLHeader & toHttpEntity
val splitPages: Flow[BuildSequenceSegment, Seq[PageRequest], NotUsed] = ???
val requestPayload: Flow[Seq[PageRequest], Seq[PageResponse], NotUsed] = ???
val wrapXMLHeader: Flow[Seq[PageResponse], Seq[PageResponse], NotUsed] = ???
val toHttpEntity: Flow[Seq[PageResponse], HttpEntity.Chunked, NotUsed] = ???
val invokeTargetLoad: Flow[HttpEntity.Chunked, RestResponse, NotUsed] = ???
Source(batch.segments)
.via(splitPages)
.via(requestPayload)
.via(wrapXMLHeader)
.via(toHttpEntity)
.via(invokeTargetLoad)
.runWith(Sink.ignore)

Batch Request for get in gmail API

I have a list of around 2500 mail ids and I'm stuck to only use requests library, so so far i do it this way to get mail headers
mail_ids = ['']
for mail_id in mails_ids:
res = requests.get(
'https://www.googleapis.com/gmail/v1/users/me/messages/{}?
format=metadata'.format(mail_id), headers=headers).json()
mail_headers = res['payload']['headers']
...
But its very inefficient and i would rather like to POST list of Ids instead, but on their documentation https://developers.google.com/gmail/api/v1/reference/users/messages/get, i don't see BatchGet, any workaround? I'm using Flask framework Thanks a lot
This is a bit late, but in case it helps anyone, here's the code I used to do a batch get of emails:
First I get a list of relevant emails. Change the request according to your needs, I'm getting only sent emails for a certain time period:
query = "https://www.googleapis.com/gmail/v1/users/me/messages?labelIds=SENT&q=after:2020-07-25 before:2020-07-31"
response = requests.get(query, headers=header)
events = json.loads(response.content)
email_tokens = events['messages']
while 'nextPageToken' in events:
response = requests.get(query+f"&pageToken={events['nextPageToken']}",
headers=header)
events = json.loads(response.content)
email_tokens += events['messages']
Then I'm batching a get request to get 100 emails at a time, and parsing only the json part of the email and putting it into a list called emails. Note that there's some repeated code here, so you may want to refactor it into a method. You'll have to set your access token here:
emails = []
access_token = '1234'
header = {'Authorization': 'Bearer ' + access_token}
batch_header = header.copy()
batch_header['Content-Type'] = 'multipart/mixed; boundary="email_id"'
data = ''
ctr = 0
for token_dict in email_tokens:
data += f'--email_id\nContent-Type: application/http\n\nGET /gmail/v1/users/me/messages/{token_dict["id"]}?format=full\n\n'
if ctr == 99:
data += '--email_id--'
print(data)
r = requests.post(f"https://www.googleapis.com/batch/gmail/v1",
headers=batch_header, data=data)
bodies = r.content.decode().split('\r\n')
for body in bodies:
if body.startswith('{'):
parsed_body = json.loads(body)
emails.append(parsed_body)
ctr = 0
data = ''
continue
ctr+=1
data += '--email_id--'
r = requests.post(f"https://www.googleapis.com/batch/gmail/v1",
headers=batch_header, data=data)
bodies = r.content.decode().split('\r\n')
for body in bodies:
if body.startswith('{'):
parsed_body = json.loads(body)
emails.append(parsed_body)
[Optional] Finally, I'm decoding the text in the email and storing only the last sent email instead of the whole thread. The regex used here splits on strings that I found were usually at the end of emails. For instance, On Tue, Jun 23, 2020, x#gmail.com said...:
import re
import base64
gmail_split_regex = r'On [a-zA-z]{3}, ([a-zA-z]{3}|\d{2}) ([a-zA-z]{3}|\d{2}),? \d{4}'
for email in emails:
if 'parts' not in email['payload']:
continue
for part in email['payload']['parts']:
if part['mimeType'] == 'text/plain':
if 'uniqueBody' not in email:
plainText = str(base64.urlsafe_b64decode(bytes(str(part['body']['data']), encoding='utf-8')))
email['uniqueBody'] = {'content': re.split(gmail_split_regex, plainText)[0]}
elif 'parts' in part:
for sub_part in part['parts']:
if sub_part['mimeType'] == 'text/plain':
if 'uniqueBody' not in email:
plainText = str(base64.urlsafe_b64decode(bytes(str(sub_part['body']['data']), encoding='utf-8')))
email['uniqueBody'] = {'content': re.split(gmail_split_regex, plainText)[0]}

cron job throwing DeadlineExceededError

I am currently working on a google cloud project in free trial mode. I have cron job to fetch the data from a data vendor and store it in the data store. I wrote the code to fetch the data couple of weeks ago and it was all working fine but all of sudden , i started receiving error "DeadlineExceededError: The overall deadline for responding to the HTTP request was exceeded" for last two days. I believe cron job is supposed to timeout only after 60 minutes any idea why i am getting the error?.
cron task
def run():
try:
config = cron.config
actual_data_source = config['xxx']['xxxx']
original_data_source = actual_data_source
company_list = cron.rest_client.load(config, "companies", '')
if not company_list:
logging.info("Company list is empty")
return "Ok"
for row in company_list:
company_repository.save(row,original_data_source, actual_data_source)
return "OK"
Repository code
def save( dto, org_ds , act_dp):
try:
key = 'FIN/%s' % (dto['ticker'])
company = CompanyInfo(id=key)
company.stock_code = key
company.ticker = dto['ticker']
company.name = dto['name']
company.original_data_source = org_ds
company.actual_data_provider = act_dp
company.put()
return company
except Exception:
logging.exception("company_repository: error occurred saving the company
record ")
raise
RestClient
def load(config, resource, filter):
try:
username = config['xxxx']['xxxx']
password = config['xxxx']['xxxx']
headers = {"Authorization": "Basic %s" % base64.b64encode(username + ":"
+ password)}
if filter:
from_date = filter['from']
to_date = filter['to']
ticker = filter['ticker']
start_date = datetime.strptime(from_date, '%Y%m%d').strftime("%Y-%m-%d")
end_date = datetime.strptime(to_date, '%Y%m%d').strftime("%Y-%m-%d")
current_page = 1
data = []
while True:
if (filter):
url = config['xxxx']["endpoints"][resource] % (ticker, current_page, start_date, end_date)
else:
url = config['xxxx']["endpoints"][resource] % (current_page)
response = urlfetch.fetch(
url=url,
deadline=60,
method=urlfetch.GET,
headers=headers,
follow_redirects=False,
)
if response.status_code != 200:
logging.error("xxxx GET received status code %d!" % (response.status_code))
logging.error("error happend for url: %s with headers %s", url, headers)
return 'Sorry, xxxx API request failed', 500
db = json.loads(response.content)
if not db['data']:
break
data.extend(db['data'])
if db['total_pages'] == current_page:
break
current_page += 1
return data
except Exception:
logging.exception("Error occured with xxxx API request")
raise
I'm guessing this is the same question as this, but now with more code:
DeadlineExceededError: The overall deadline for responding to the HTTP request was exceeded
I modified your code to write to the database after each urlfetch. If there are more pages, then it relaunches itself in a deferred task, which should be well before the 10 minute timeout.
Uncaught exceptions in a deferred task cause it to retry, so be mindful of that.
It was unclear to me how actual_data_source & original_data_source worked, but I think you should be able to modify that part.
crontask
def run(current_page=0):
try:
config = cron.config
actual_data_source = config['xxx']['xxxx']
original_data_source = actual_data_source
data, more = cron.rest_client.load(config, "companies", '', current_page)
for row in data:
company_repository.save(row, original_data_source, actual_data_source)
# fetch the rest
if more:
deferred.defer(run, current_page + 1)
except Exception as e:
logging.exception("run() experienced an error: %s" % e)
RestClient
def load(config, resource, filter, current_page):
try:
username = config['xxxx']['xxxx']
password = config['xxxx']['xxxx']
headers = {"Authorization": "Basic %s" % base64.b64encode(username + ":"
+ password)}
if filter:
from_date = filter['from']
to_date = filter['to']
ticker = filter['ticker']
start_date = datetime.strptime(from_date, '%Y%m%d').strftime("%Y-%m-%d")
end_date = datetime.strptime(to_date, '%Y%m%d').strftime("%Y-%m-%d")
url = config['xxxx']["endpoints"][resource] % (ticker, current_page, start_date, end_date)
else:
url = config['xxxx']["endpoints"][resource] % (current_page)
response = urlfetch.fetch(
url=url,
deadline=60,
method=urlfetch.GET,
headers=headers,
follow_redirects=False,
)
if response.status_code != 200:
logging.error("xxxx GET received status code %d!" % (response.status_code))
logging.error("error happend for url: %s with headers %s", url, headers)
return [], False
db = json.loads(response.content)
return db['data'], (db['total_pages'] != current_page)
except Exception as e:
logging.exception("Error occured with xxxx API request: %s" % e)
return [], False
I would prefer to write this as a comment, but I need more reputation to do that.
What happens when you run the actual data fetch directly instead of
through the cron job?
Have you tried measuring a time delta from the start to the end of
the job?
Has the number of companies being retrieved increased dramatically?
You appear to be doing some form of stock quote aggregation - is it
possible that the provider has started blocking you?

Scrapy : Why should I use yield for multiple request?

Simply I need three conditions.
1) Log-in
2) Multiple request
3) Synchronous request ( sequential like 'C' )
I realized 'yield' should be used for multiple request.
But I think 'yield' works differently with 'C' and not sequential.
So I want to use request without 'yield' like below.
But crawl method wasn`t called ordinarily.
How can I call crawl method sequentially like C ?
class HotdaySpider(scrapy.Spider):
name = "hotday"
allowed_domains = ["test.com"]
login_page = "http://www.test.com"
start_urls = ["http://www.test.com"]
maxnum = 27982
runcnt = 10
def parse(self, response):
return [FormRequest.from_response(response,formname='login_form',formdata={'id': 'id', 'password': 'password'}, callback=self.after_login)]
def after_login(self, response):
global maxnum
global runcnt
i = 0
while i < runcnt :
**Request(url="http://www.test.com/view.php?idx=" + str(maxnum) + "/",callback=self.crawl)**
i = i + 1
def crawl(self, response):
global maxnum
filename = 'hotday.html'
with open(filename, 'wb') as f:
f.write(unicode(response.body.decode(response.encoding)).encode('utf-8'))
maxnum = maxnum + 1
When you return a list of requests (that's what you do when you yield many of them) Scrapy will schedule them and you can't control the order in which the responses will come.
If you want to process one response at a time and in order, you would have to return only one request in your after_login method and construct the next request in your crawl method.
def after_login(self, response):
return Request(url="http://www.test.com/view.php?idx=0/", callback=self.crawl)
def crawl(self, response):
global maxnum
global runcnt
filename = 'hotday.html'
with open(filename, 'wb') as f:
f.write(unicode(response.body.decode(response.encoding)).encode('utf-8'))
maxnum = maxnum + 1
next_page = int(re.search('\?idx=(\d*)', response.request.url).group(1)) + 1
if < runcnt:
return Request(url="http://www.test.com/view.php?idx=" + next_page + "/", callback=self.crawl)

Resources