unable to extra/list all event log on watson assistant wrokspace - ibm-watson

Please help I was trying to call watson assistant endpoint
https://gateway.watsonplatform.net/assistant/api/v1/workspaces/myworkspace/logs?version=2018-09-20 to get all the list of events
and filter by date range using this params
var param =
{ workspace_id: '{myworkspace}',
page_limit: 100000,
filter: 'response_timestamp%3C2018-17-12,response_timestamp%3E2019-01-01'}
apparently I got any empty response below.
{
"logs": [],
"pagination": {}
}

Couple of things to check.
1. You have 2018-17-12 which is a metric date. This translates to "12th day of the 17th month of 2018".
2. Assuming the date should be a valid one, your search says "Documents that are Before 17th Dec 2018 and after 1st Jan 2019". Which would return no documents.
3. Logs are only generated when you call the message() method through the API. So check your logging page in the tooling to see if you even have logs.
4. If you have a lite account logs are only stored for 7 days and then deleted. To keep logs longer you need to upgrade to a standard account.
Although not directly related to your issue, be aware that page_limit has an upper hard coded limit (IIRC 200-300?). So you may ask for 100,000 records, but it won't give it to you.
This is sample python code (unsupported) that is using pagination to read the logs:
from watson_developer_cloud import AssistantV1
username = '...'
password = '...'
workspace_id = '....'
url = '...'
version = '2018-09-20'
c = AssistantV1(url=url, version=version, username=username, password=password)
totalpages = 999
pagelimit = 200
logs = []
page_count = 1
cursor = None
count = 0
x = { 'pagination': 'DUMMY' }
while x['pagination']:
if page_count > totalpages:
break
print('Reading page {}. '.format(page_count), end='')
x = c.list_logs(workspace_id=workspace_id,cursor=cursor,page_limit=pagelimit)
if x is None: break
print('Status: {}'.format(x.get_status_code()))
x = x.get_result()
logs.append(x['logs'])
count = count + len(x['logs'])
page_count = page_count + 1
if 'pagination' in x and 'next_url' in x['pagination']:
p = x['pagination']['next_url']
u = urlparse(p)
query = parse_qs(u.query)
cursor = query['cursor'][0]
Your logs object should contain the logs.

I believe the limit is 500, and then we return a pagination URL so you can get the next 500. I dont think this is the issue but once you start getting logs back its good to know

Related

lapply calling .csv for changes in a parameter

Good afternoon
I am currently trying to pull some data from pushshift but I am maxing out at 100 posts. Below is the code for pulling one day that works great.
testdata1<-getPushshiftData(postType = "submission", size = 1000, before = "1546300800", after= "1546200800", subreddit = "mysubreddit", nest_level = 1)
I have a list of Universal Time Codes for the beginning and ending of each day for a month. What I would like to do is get the syntax to replace the "after" and "before" values for each day and for each day to be added to the end of the pulled data. Even if it placed the data to a bunch of separate smaller datasets I could work with it.
Here is my (feeble) attempt. "links" is the data frame with the UTCs
mydata<- lapply(1:30, function(x) getPushshiftData(postType = "submission", size = 1000, after= links$utcstart[,x],before = links$utcendstart[,x], subreddit = "mysubreddit", nest_level = 1))
Here is the error message I get: Error in links$utcstart[, x] : incorrect number of dimensions
I've also tried without the "function (x)" argument and get the following message:
Error in ifelse(is.null(after), "", sprintf("&after=%s", after)) :
object 'x' not found
Can anyone help with this?

Player command failed: Premium required. However i have premium

Im am using a family account (premium) and this code returns a'Premium required' error. My code is as follows:
device_id = '0d1841b0976bae2a3a310dd74c0f3df354899bc8'
def playSpotify():
client_credentials_manager = SpotifyClientCredentials(client_id='<REDACTED>', client_secret='<REDACTED>')
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)
playlists = sp.user_playlists('gh8gflxedxmp4tv2he2gp92ev')
#while playlists:
#for i, playlist in enumerate(playlists['items']):
#print("%4d %s %s" % (i + 1 + playlists['offset'], playlist['uri'], playlist['name']))
#if playlists['next']:
#playlists = sp.next(playlists)
#else:
#playlists = None
#sp.shuffle(true, device_id=device_id)
#sp.repeat(true, device_id=device_id)
sp.start_playback(device_id=device_id, context_uri='spotify:playlist:4ndG2qFEFt1YYcHYt3krjv')
When using SpotifyClientCredentials the token that is generated doesn't belong to any user but to an app, hence the error message.
What you need to do is use SpotifyOAuth instead. So to initialize spotipy, just do:
sp = spotipy.Spotify(auth_manager=spotipy.SpotifyOAuth())
This will open a browser tab and require you to sign in to your account.

Batch Request for get in gmail API

I have a list of around 2500 mail ids and I'm stuck to only use requests library, so so far i do it this way to get mail headers
mail_ids = ['']
for mail_id in mails_ids:
res = requests.get(
'https://www.googleapis.com/gmail/v1/users/me/messages/{}?
format=metadata'.format(mail_id), headers=headers).json()
mail_headers = res['payload']['headers']
...
But its very inefficient and i would rather like to POST list of Ids instead, but on their documentation https://developers.google.com/gmail/api/v1/reference/users/messages/get, i don't see BatchGet, any workaround? I'm using Flask framework Thanks a lot
This is a bit late, but in case it helps anyone, here's the code I used to do a batch get of emails:
First I get a list of relevant emails. Change the request according to your needs, I'm getting only sent emails for a certain time period:
query = "https://www.googleapis.com/gmail/v1/users/me/messages?labelIds=SENT&q=after:2020-07-25 before:2020-07-31"
response = requests.get(query, headers=header)
events = json.loads(response.content)
email_tokens = events['messages']
while 'nextPageToken' in events:
response = requests.get(query+f"&pageToken={events['nextPageToken']}",
headers=header)
events = json.loads(response.content)
email_tokens += events['messages']
Then I'm batching a get request to get 100 emails at a time, and parsing only the json part of the email and putting it into a list called emails. Note that there's some repeated code here, so you may want to refactor it into a method. You'll have to set your access token here:
emails = []
access_token = '1234'
header = {'Authorization': 'Bearer ' + access_token}
batch_header = header.copy()
batch_header['Content-Type'] = 'multipart/mixed; boundary="email_id"'
data = ''
ctr = 0
for token_dict in email_tokens:
data += f'--email_id\nContent-Type: application/http\n\nGET /gmail/v1/users/me/messages/{token_dict["id"]}?format=full\n\n'
if ctr == 99:
data += '--email_id--'
print(data)
r = requests.post(f"https://www.googleapis.com/batch/gmail/v1",
headers=batch_header, data=data)
bodies = r.content.decode().split('\r\n')
for body in bodies:
if body.startswith('{'):
parsed_body = json.loads(body)
emails.append(parsed_body)
ctr = 0
data = ''
continue
ctr+=1
data += '--email_id--'
r = requests.post(f"https://www.googleapis.com/batch/gmail/v1",
headers=batch_header, data=data)
bodies = r.content.decode().split('\r\n')
for body in bodies:
if body.startswith('{'):
parsed_body = json.loads(body)
emails.append(parsed_body)
[Optional] Finally, I'm decoding the text in the email and storing only the last sent email instead of the whole thread. The regex used here splits on strings that I found were usually at the end of emails. For instance, On Tue, Jun 23, 2020, x#gmail.com said...:
import re
import base64
gmail_split_regex = r'On [a-zA-z]{3}, ([a-zA-z]{3}|\d{2}) ([a-zA-z]{3}|\d{2}),? \d{4}'
for email in emails:
if 'parts' not in email['payload']:
continue
for part in email['payload']['parts']:
if part['mimeType'] == 'text/plain':
if 'uniqueBody' not in email:
plainText = str(base64.urlsafe_b64decode(bytes(str(part['body']['data']), encoding='utf-8')))
email['uniqueBody'] = {'content': re.split(gmail_split_regex, plainText)[0]}
elif 'parts' in part:
for sub_part in part['parts']:
if sub_part['mimeType'] == 'text/plain':
if 'uniqueBody' not in email:
plainText = str(base64.urlsafe_b64decode(bytes(str(sub_part['body']['data']), encoding='utf-8')))
email['uniqueBody'] = {'content': re.split(gmail_split_regex, plainText)[0]}

cron job throwing DeadlineExceededError

I am currently working on a google cloud project in free trial mode. I have cron job to fetch the data from a data vendor and store it in the data store. I wrote the code to fetch the data couple of weeks ago and it was all working fine but all of sudden , i started receiving error "DeadlineExceededError: The overall deadline for responding to the HTTP request was exceeded" for last two days. I believe cron job is supposed to timeout only after 60 minutes any idea why i am getting the error?.
cron task
def run():
try:
config = cron.config
actual_data_source = config['xxx']['xxxx']
original_data_source = actual_data_source
company_list = cron.rest_client.load(config, "companies", '')
if not company_list:
logging.info("Company list is empty")
return "Ok"
for row in company_list:
company_repository.save(row,original_data_source, actual_data_source)
return "OK"
Repository code
def save( dto, org_ds , act_dp):
try:
key = 'FIN/%s' % (dto['ticker'])
company = CompanyInfo(id=key)
company.stock_code = key
company.ticker = dto['ticker']
company.name = dto['name']
company.original_data_source = org_ds
company.actual_data_provider = act_dp
company.put()
return company
except Exception:
logging.exception("company_repository: error occurred saving the company
record ")
raise
RestClient
def load(config, resource, filter):
try:
username = config['xxxx']['xxxx']
password = config['xxxx']['xxxx']
headers = {"Authorization": "Basic %s" % base64.b64encode(username + ":"
+ password)}
if filter:
from_date = filter['from']
to_date = filter['to']
ticker = filter['ticker']
start_date = datetime.strptime(from_date, '%Y%m%d').strftime("%Y-%m-%d")
end_date = datetime.strptime(to_date, '%Y%m%d').strftime("%Y-%m-%d")
current_page = 1
data = []
while True:
if (filter):
url = config['xxxx']["endpoints"][resource] % (ticker, current_page, start_date, end_date)
else:
url = config['xxxx']["endpoints"][resource] % (current_page)
response = urlfetch.fetch(
url=url,
deadline=60,
method=urlfetch.GET,
headers=headers,
follow_redirects=False,
)
if response.status_code != 200:
logging.error("xxxx GET received status code %d!" % (response.status_code))
logging.error("error happend for url: %s with headers %s", url, headers)
return 'Sorry, xxxx API request failed', 500
db = json.loads(response.content)
if not db['data']:
break
data.extend(db['data'])
if db['total_pages'] == current_page:
break
current_page += 1
return data
except Exception:
logging.exception("Error occured with xxxx API request")
raise
I'm guessing this is the same question as this, but now with more code:
DeadlineExceededError: The overall deadline for responding to the HTTP request was exceeded
I modified your code to write to the database after each urlfetch. If there are more pages, then it relaunches itself in a deferred task, which should be well before the 10 minute timeout.
Uncaught exceptions in a deferred task cause it to retry, so be mindful of that.
It was unclear to me how actual_data_source & original_data_source worked, but I think you should be able to modify that part.
crontask
def run(current_page=0):
try:
config = cron.config
actual_data_source = config['xxx']['xxxx']
original_data_source = actual_data_source
data, more = cron.rest_client.load(config, "companies", '', current_page)
for row in data:
company_repository.save(row, original_data_source, actual_data_source)
# fetch the rest
if more:
deferred.defer(run, current_page + 1)
except Exception as e:
logging.exception("run() experienced an error: %s" % e)
RestClient
def load(config, resource, filter, current_page):
try:
username = config['xxxx']['xxxx']
password = config['xxxx']['xxxx']
headers = {"Authorization": "Basic %s" % base64.b64encode(username + ":"
+ password)}
if filter:
from_date = filter['from']
to_date = filter['to']
ticker = filter['ticker']
start_date = datetime.strptime(from_date, '%Y%m%d').strftime("%Y-%m-%d")
end_date = datetime.strptime(to_date, '%Y%m%d').strftime("%Y-%m-%d")
url = config['xxxx']["endpoints"][resource] % (ticker, current_page, start_date, end_date)
else:
url = config['xxxx']["endpoints"][resource] % (current_page)
response = urlfetch.fetch(
url=url,
deadline=60,
method=urlfetch.GET,
headers=headers,
follow_redirects=False,
)
if response.status_code != 200:
logging.error("xxxx GET received status code %d!" % (response.status_code))
logging.error("error happend for url: %s with headers %s", url, headers)
return [], False
db = json.loads(response.content)
return db['data'], (db['total_pages'] != current_page)
except Exception as e:
logging.exception("Error occured with xxxx API request: %s" % e)
return [], False
I would prefer to write this as a comment, but I need more reputation to do that.
What happens when you run the actual data fetch directly instead of
through the cron job?
Have you tried measuring a time delta from the start to the end of
the job?
Has the number of companies being retrieved increased dramatically?
You appear to be doing some form of stock quote aggregation - is it
possible that the provider has started blocking you?

How can i get the logs of past months in postgres DB .

Issue :
Someone has added a junk column in one of my table.I want to figure it out from the logs as when and from where this activity has been performed.
Please Help regarding this issue.
Make sure enable logging in postgresql.conf
1.log_destination = 'stderr' #log_destination = 'stderr,csvlog,syslog'
2.logging_collector = on #need restart
3.log_directory = 'pg_log'
4.log_file_name = 'postgresql-%Y-%m-%d_%H%M%S.log'
5.log_rotation_age = 1d
6.log_rotation_size = 10MB
7.log_min_error_statement = error
8.log_min_duration_statement = 5000 # -1 = disable ; 0 = ALL ; 5000 = 5sec
9.log_line_prefix = '|%m|%r|%d|%u|%e|'
10.log_statment = 'ddl' # 'none' | 'ddl' | 'mod' | 'all'
#prefer 'ddl' because the log output will be 'ddl' and 'query min duration'
If you don't enable it, make sure enable it now.
if you don't have log the last attempt is pg_xlogdump your xlog file under pg_xlog and look for DDL

Resources