Matomo ÄPI "Actions.getPageUrls" returns only 100 rows on rest api call - matomo

I am trying to fetch data from matomo api "Actions.getPageUrls" by using below code:
import requests
import pandas as pd
api_url="baseapi"
PARAMS = {'module': 'API',
'method':'Actions.getPageUrls',
'period' : 'range',
'date': '2019-01-01,2020-01-01',
'filter_limit' : '-1',
'idSite': '1',
'format': 'JSON',
'expanded' : '1',
'token_auth': "tocken"}
r = requests.post(url = api_url, params = PARAMS, verify=False)
print(r.url)
matomo_df = pd.DataFrame(r.json())
matomo_df.head()
matomo_df['label']
matomo_df = pd.DataFrame(r.json()[0]['subtable'])
matomo_df
But, it returns only 100 rows.
I want to get more than 100 rows. Could you please help me.

By default it is set to return only 100 rows, however when you set the 'filter-limit' to -1, it is suppose to return all the rows.Can you set the 'filter-limit' param to 10000 and try it.

Related

How to retrieve more than 50 records using Spotipy API

I'm using the Spotipy API to retrieve song data from Spotify. Here's my code:
import pandas as pd
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id='<my_client_id>',
client_secret='<my_client_secret'))
results = sp.search(q="artist:guns n' roses", limit=50)
d = []
for idx, track in enumerate(results['tracks']['items']):
d.append (
{
'Track' : track['name'],
'Album' : track['album']['name'],
'Artist' : track['artists'][0]['name'],
'Release Date' : track['album']['release_date'],
'Track Number' : track['track_number'],
'Popularity' : track['popularity'],
'Track Number' : track['track_number'],
'Explicit' : track['explicit'],
'Duration' : track['duration_ms'],
'Audio Preview URL' : track['preview_url'],
'Album URL' : track['album']['external_urls']['spotify']
}
)
pd.DataFrame(d)
Per the docs, it appears that Spotify has a limit of 50 records.
Is it possible to retrieve all records for a given string search? (e.g. by chunking requests, etc.)
Thanks!
The Spotify Web API can return a maximum of 1000 items. (In this example, it found 390 tracks, so it got all of them.)
Here is the code to get them:
import pandas as pd
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id='<my_client_id>',
client_secret='<my_client_secret>'))
d = []
total = 1 # temporary variable
offset = 0
while offset < total:
results = sp.search(q="artist:guns n' roses", type='track', offset=offset, limit=50)
total = results['tracks']['total']
offset += 50 # increase the offset
for idx, track in enumerate(results['tracks']['items']):
d.append (
{
'Track' : track['name'],
'Album' : track['album']['name'],
'Artist' : track['artists'][0]['name'],
'Release Date' : track['album']['release_date'],
'Track Number' : track['track_number'],
'Popularity' : track['popularity'],
'Track Number' : track['track_number'],
'Explicit' : track['explicit'],
'Duration' : track['duration_ms'],
'Audio Preview URL' : track['preview_url'],
'Album URL' : track['album']['external_urls']['spotify']
}
)
pd.DataFrame(d)

post xlsx file to flask route with React async/await

I have a project where I was originally using fetch for all of my requests, and my excel upload function was working fine. I recently made a lot of changes to my application because I added user login and authentication, so I am now trying to reconfigure my routes using async await in React. I am getting different errors that I don't know how to solve. Sometimes when I try to submit the uploaded file I get a 400 error for a bad request, usually saying ""The browser (or proxy) sent a request that this server could not understand.\nKeyError: 'project'". Or I am getting an error on the back-end that says "ValueError: Excel file format cannot be determined, you must specify an engine manually."
I am really confused about where the error is actually occurring because it seems to be reaching the Flask route and then failing.
This is my current request on the front-end:
const onSubmit = async (ev) => {
ev.preventDefault();
let formData = new FormData();
formData.append("project", selectedFile);
// let status = 0;
const data = await api.post("/projects", {
body: formData,
});
if (!data.ok) {
console.log(data);
console.log("error");
} else {
console.log(data);
console.log("uploaded");
}
};
and the back-end route:
#projects.route('/projects', methods=['POST'])
#authenticate(token_auth)
def request_load_pickle_sim():
fn = save_file(None, "project", "xlsx")
print(fn)
request.files["project"].save(fn)
result = process_raw_util_xl_file(fn)
claims = result['claims']
print(claims)
utilities = result['utilities']
weights = result['weights']
maxdiff_scores = result['maxdiff_scores']
data = jsonify(claims)
print(data)
mdp = MaxDiffProject()
mdp.config = MutableDict({
'claims': claims,
'utilities': utilities,
'weights': weights,
'maxdiff_scores': maxdiff_scores
})
print(mdp.config)
db.session.add(mdp)
db.session.commit()
mdp = MaxDiffProject().query.first()
project_config = mdp.config
claims = project_config['claims']
print(claims)
data = jsonify(claims)
return data
My debugger is printing the instance of the file, so it seems like the backend is receiving the file? But after that nothing is working. The file is passed to the process_raw_util_xl_file function (which is where the ValueError error is coming from). I can't figure out what the core issue is because I'm getting conflicting errors. I guess I'm just confused because it was all working fine when I had fetch requests.
The function on the backend is breaking here:
def process_raw_util_xl_file(xl_fn):
# df = pd.read_csv(xl_fn.read())
df = pd.read_excel(xl_fn) # read the excel file
print(df)
row = df.shape[0] # df.shape[0] = Number of rows, df.shape[1] = number of columns
index_col = [col for col in df if 'id' in col.lower()][0]
print(index_col)
df.set_index(index_col, inplace=True) # Setting the ID as the index
# weights = df['Weights'] if 'Weights' in df else pd.Series([1]*row) # Creating the weights array if there are weights , otherwise an array of 1's with # of rows
if 'Weights' not in df:
df['Weights'] = 1
weights = df['Weights']
df.drop('Weights', axis=1, inplace=True)
sum = weights.values.sum() # Sum of the weights
# if 'Weights' in df: df = df.drop('Weights', axis=1) # removing weights from df if they are there
rlh_cols = [col for col in df if 'rlh' in col.lower()][:1]
df = df.drop(rlh_cols, axis=1) # removing RLH from df if they are there
max_diff_scores = (e ** df) / (1 + e ** df) * 100 # exp the values of the dataframe with
utils = df
return {
"utilities": utils,
"claims": [col for col in utils],
"maxdiff_scores": max_diff_scores,
"weights": weights
}
You are posting an object as a body paramter to the server which has a content type as application/json whereas according the HTTP protocol, to post form-data the content type must be multipart/form-data. Here's how you are doing it,
let formData = new FormData();
formData.append("project", selectedFile);
// let status = 0;
const data = await api.post("/projects", {
body: formData,
});
According to the docs (at page end) you must post form data like this,
let formData = new FormData();
formData.append("project", selectedFile);
// let status = 0;
const data = await api.post("/projects", formData);
Also you cannot post body and form data at a same time within single request.

Display dash datatable through callback

I want to be able to return a populated dash table based on the results from an input search. I've tried 2 methods so far - returning the entire DashTable in the callback output and returning the columns and data separately in the callback. Both options haven't been working for me. I've included the relevant code for each option and the error message that results from each:
Return the data and columns separately:
#app.callback(
[Output('table', 'data'),
Output('table', 'columns')],
[Input("button", "n_clicks")], state=[State('url', 'value')])
def update_table(n_click:int, url):
if n_click>1:
summary, table = summarizer(url)
columns=[{"name": i, "id": i, "deletable": True, "selectable": True} for i in table.columns]
table = table.to_dict('records')
return table, columns
else:
return [], []
The app.layout contains the following line
html.Div(dt.DataTable(id='table'))
The error message that results from this is:
Objects are not valid as a React child
The second approach was to pass in the entire DataTable through the callback and display it using just the html.Div in the layout like this
#app.callback(
Output('table', 'children'),
[Input("button", "n_clicks")], state=[State('url', 'value')])
def update_table(n_click:int, url):
if n_click>1:
summary, table = summarizer(url)
columns=[{"name": i, "id": i, "deletable": True, "selectable": True} for i in table.columns]
table = table.to_dict('records')
return dt.DataTable(data=table, columns=columns)
else:
return []
html.Div(id='table')
The corresponding error was
[Objects are not valid as a React child][2]
This error is confusing to me since it seems to be regarding the column definition however I can't pass in an array and the documentation asks for a dictionary.
Full code sample:
import dash
import dash_core_components as dcc
import dash_html_components as html
import dash_bootstrap_components as dbc
import dash_table as dt
from dash.dependencies import Input, Output, State
import sd_material_ui
from newspaper import Article
import gensim
from gensim.summarization import summarize
from dash.exceptions import PreventUpdate
from newspaper import fulltext
import requests
import pandas as pd
import yake
import nltk
from newsapi import NewsApiClient
leftSources = ["cnn", "buzzfeed", "the-washington-post", "bbc-news", "vice-news", "newsweek", "techcrunch", "reuters", "politico", "newsweek", "msnbc"]
rightSources = ["fox-news", "national-review", "new-york-magazine", "breitbart-news", "business-insider", "the-wall-street-journal", "bloomberg", "the-washington-times", "the-hill", "the-american-conservative"]
# importing CSS
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
# similarArticleURL
getSimilarArticlesURL = "https://us-central1-secure-site-266302.cloudfunctions.net/getSimilarArticles?keywords="
getKeywordsURL = "https://us-central1-secure-site-266302.cloudfunctions.net/getKeyword?text="
getArticleTextURL = "https://us-central1-secure-site-266302.cloudfunctions.net/getArticleText?url="
allData = pd.DataFrame()
# instantiating dash application
app = dash.Dash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP])
server = app.server # the flask app
# helper functions
def generate_table(dataframe, max_rows=10):
return html.Table([
html.Thead(
html.Tr([html.Th(col) for col in dataframe.columns])
),
html.Tbody([
html.Tr([
html.Td(dataframe.iloc[i][col]) for col in dataframe.columns
]) for i in range(min(len(dataframe), max_rows))
])
])
app.layout = html.Div([
html.Div(html.H3("Brief.Me"), style={'font-weight':'bold','background-color':'darkorange', 'color':'white','text-align':'center'}),
html.Br(),
html.Br(),
dbc.Row([
dbc.Col(dbc.Input(id='url', type='url', size=30, placeholder="Type or copy/paste an URL"), width={'size':6, 'order':1, 'offset':3}),
dbc.Col(dbc.Button("Summarize", id='button', n_clicks=1, color="primary", className="mr-1"), width={'order':2})
]),
html.Br(),
# dbc.Row([
# dbc.Col(dcc.Loading(html.Div(html.Div(id="summary"), style={'font-weight':'bold'})), width={'size':6, 'offset':3})
# ]),
html.Div(id='table')
],
)
def fetch_similar_articles(keyword):
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
newsapi = NewsApiClient(api_key='ce7482cbd40f4d90a8eea404e7702db6')
top_headlines = newsapi.get_top_headlines(q=keyword,
sources='bbc-news,the-wall-street-journal,the-washington-post,fox-news,bloomberg, vice-news, politico, reuters, the-hill',
language='en')
return top_headlines["articles"]
def fetch_article_text(url):
try:
article = Article(url)
article.download()
article.parse()
return article.text
except:
return None
def summarizer(url):
global allData
leftSummaries, rightSummaries = {}, {}
text = fetch_article_text(url)
main_summary = summarize(text)
keywords = extract_keywords(text)
urls = []
rightData, leftData, allData = get_articles_content(keywords)
rightDf, leftDf = pd.DataFrame(rightData), pd.DataFrame(leftData)
allSources = pd.concat([rightDf, leftDf], axis=1)
return main_summary, allData
def get_articles_content(keywords):
'''
This function will return a row of the dataframe where there is a title, source, url and summary.
'''
allResults, leftRows, rightRows = [], [], []
for keyword in keywords:
articleList = fetch_similar_articles(keyword)
for elem in articleList:
source = elem['source']
url = elem['url']
title = elem['title']
text = fetch_article_text(url)
if text is not None and len(text) > 1:
summary = summarize(text)
allResults.append({'title': title, 'url': url,'source': source, 'summary': summary})
if source in leftSources:
leftRows.append(pd.DataFrame({'title': title, 'url': url,'source': source, 'summary': summary}))
elif source in rightSources:
rightRows.append(pd.DataFrame({'title': title, 'url': url, 'source': source, 'summary': summary}))
allResults = pd.DataFrame(allResults)
return leftRows, rightRows, allResults
def extract_keywords_yake(text, phrase_length, num_keywords):
custom_kw_extractor = yake.KeywordExtractor(n=phrase_length, top=num_keywords)
keywords = custom_kw_extractor.extract_keywords(text)
return keywords
def extract_keywords(text):
'''
Returns a list of keywords given the article text.
'''
global getKeywordsURL
getKeywordsURL += text
keywordRes = extract_keywords_yake(text, 2, 5)
keywords = []
for pair in keywordRes:
keywords.append(pair[1])
return keywords
#app.callback( # Output('summary', 'children')
Output('table', 'children'),
[Input("button", "n_clicks")], state=[State('url', 'value')])
def update_table(n_click:int, url):
if n_click>1:
summary, table = summarizer(url)
columns=[{"name": i, "id": i, "deletable": True, "selectable": True} for i in table.columns]
table = table.to_dict('records')
return dt.DataTable(data=table, columns=columns)
else:
return [], []
if __name__ == '__main__':
app.run_server(debug=True, host='0.0.0.0', port=8080)

CakePHP 3.x : transform updateAll() into save() loop, for a multiple edit page

I use audit-stash plugin which works fine with all my tables. But I have a particular function in which the user selects rows with checkboxes, and then changes a specific field to all of them. The table audits contains a fields called "primary_key" which seems not working for such case.
in my Controller, function, I put this:
$this->request->data;
$data = $this->request->data;
if($this->request->is(['patch', 'post', 'put']))
{
$ids = $this->request->data('data.AssetsAssignations.id');
$room_id = $this->request->data('room_id');
$this->AssetsAssignations->updateAll(
['room_id ' => $room_id ],
['id IN' => $ids]
);
}
in my table, I used this:
$this->addBehavior('AuditStash.AuditLog');
I was told that there is no way around this for audit-stash, because updateAll bypasses model callbacks by directly sending a query to the database.
I was suggested to update records one by one if I need to keep the log.
How can I transform my updateAll() code into a Save() loop ?
This try did not work for me, using save() and saveMany() :
$this->request->data;
$data = $this->request->data;
if($this->request->is(['patch', 'post', 'put']))
{
$ids = $this->request->data('data.AssetsAssignations.id');
$asset_status_id = $this->request->data('asset_status_id');
foreach($ids as $id) {
$this->AssetsAssignations->saveMany(
['asset_status_id ' => $asset_status_id ]
);
}
}
thanks in advance.
Actually you don't have to call get($id) for every id. This get the entity from the table and causes a lot of useless queries
if($this->request->is(['patch', 'post', 'put']))
{
$ids = $this->request->data('data.AssetsAssignations.id');
$asset_status_id = $this->request->data('asset_status_id');
$assetsAssignationsTable = TableRegistry::get('AssetsAssignations');
foreach($ids as $id) {
$assetsAssignation = $assetsAssignationsTable->newEntity(); // returns an empty entity
$assetsAssignation->id = $id; // assign the id to the entity
$assetsAssignation->asset_status_id = $asset_status_id;
$assetsAssignationsTable->save($assetsAssignation);
}
}
Thanks to Greg, this code worked for me:
use Cake\ORM\TableRegistry;
...
if($this->request->is(['patch', 'post', 'put']))
{
$ids = $this->request->data('data.AssetsAssignations.id');
$asset_status_id = $this->request->data('asset_status_id');
$assetsAssignationsTable = TableRegistry::get('AssetsAssignations');
foreach($ids as $id) {
$assetsAssignation = $assetsAssignationsTable->get($id); // Return assetsAssignation with id
$assetsAssignation->asset_status_id = $asset_status_id;
$assetsAssignationsTable->save($assetsAssignation);
}
}

How to use cursors for search in gae?

When I RTFM, I can't understand how to specify paginated searches using the technique described in the manual. Here's my code:
def find_documents(query_string, limit, cursor):
try:
subject_desc = search.SortExpression(
expression='date',
direction=search.SortExpression.DESCENDING,
default_value=datetime.now().date())
# Sort up to 1000 matching results by subject in descending order
sort = search.SortOptions(expressions=[subject_desc], limit=1000)
# Set query options
options = search.QueryOptions(
limit=limit, # the number of results to return
cursor=cursor,
sort_options=sort,
#returned_fields=['author', 'subject', 'summary'],
#snippeted_fields=['content']
)
query = search.Query(query_string=query_string, options=options)
index = search.Index(name=_INDEX_NAME)
# Execute the query
return index.search(query)
except search.Error:
logging.exception('Search failed')
return None
class MainAdvIndexedPage(SearchBaseHandler):
"""Handles search requests for comments."""
def get(self):
"""Handles a get request with a query."""
regionname = 'Delhi'
region = Region.all().filter('name = ', regionname).get()
uri = urlparse(self.request.uri)
query = ''
if uri.query:
query = parse_qs(uri.query)
query = query['query'][0]
results = find_documents(query, 50, search.Cursor())
next_cursor = results.cursor
template_values = {
'results': results,'next_cursor':next_cursor,
'number_returned': len(results.results),
'url': url, 'user' : users.get_current_user(),
'url_linktext': url_linktext, 'region' : region, 'city' : '', 'request' : self.request, 'form' : SearchForm(), 'query' : query
}
self.render_template('indexed.html', template_values)
The code above works and does a search but it doesn't page the result. I wonder about the following code in the manual:
next_cursor = results.cursor
next_cursor_urlsafe = next_cursor.web_safe_string
# save next_cursor_urlsafe
...
# restore next_cursor_urlsafe
results = find_documents(query_string, 20,
search.Cursor(web_safe_string=next_cursor_urlsafe))
What is next_cursor used for? How do I save and what is the purpose of saving? How do I get a cursor in the first place? Should the code look something like this instead, using memcache to save an restore the cursor?
class MainAdvIndexedPage(SearchBaseHandler):
"""Handles search requests for comments."""
def get(self):
"""Handles a get request with a query."""
regionname = 'Delhi'
region = Region.all().filter('name = ', regionname).get()
uri = urlparse(self.request.uri)
query = ''
if uri.query:
query = parse_qs(uri.query)
query = query['query'][0]
# restore next_cursor_urlsafe
next_cursor_urlsafe = memcache.get('results_cursor')
if last_cursor:
results = find_documents(query_string, 50,
search.Cursor(web_safe_string=next_cursor_urlsafe))
results = find_documents(query, 50, search.Cursor())
next_cursor = results.cursor
next_cursor_urlsafe = next_cursor.web_safe_string
# save next_cursor_urlsafe
memcache.set('results_cursor', results.cursor)
template_values = {
'results': results,'next_cursor':next_cursor,
'number_returned': len(results.results),
'url': url, 'user' : users.get_current_user(),
'url_linktext': url_linktext, 'region' : region, 'city' : '', 'request' : self.request, 'form' : SearchForm(), 'query' : query
}
self.render_template('indexed.html', template_values)
Update
From what I see from the answer, I'm supposed to use an HTTP GET query string to save the cursor but I still don't know exactly how. Please tell me how.
Update 2
This is my new effort.
def get(self):
"""Handles a get request with a query."""
regionname = 'Delhi'
region = Region.all().filter('name = ', regionname).get()
cursor = self.request.get("cursor")
uri = urlparse(self.request.uri)
query = ''
if uri.query:
query = parse_qs(uri.query)
query = query['query'][0]
logging.info('search cursor: %s', search.Cursor())
if cursor:
results = find_documents(query, 50, cursor)
else:
results = find_documents(query, 50, search.Cursor())
next_cursor = None
if results and results.cursor:
next_cursor = results.cursor.web_safe_string
logging.info('next cursor: %s', str(next_cursor))
template_values = {
'results': results,'cursor':next_cursor,
'number_returned': len(results.results),
'user' : users.get_current_user(),
'region' : region, 'city' : '', 'request' : self.request, 'form' : SearchForm(), 'query' : query
}
I think that I've understood how it's supposed to work with the above, and it's outputting a cursor at the first hit so I can know how to get the cursor in the first place. This is clearly documented enough. But I get this error message: cursor must be a Cursor, got unicode
No, you should not use memcache for that, especially with a constant key like 'results_cursor' - that would mean that all users would get the same cursor, which would be bad.
You are already passing the cursor to the template context (although you should be converting to the web_safe_string as you do in the second example). In the template, you should ensure that the cursor string is included in the GET parameters of your "next" button: then, back in the view, you should extract it from there and pass it into the find_documents call.
Apart from the memcache issue, you're almost there with the second example, but you should obviously ensure that the second call to find_documents is inside an else block so it doesn't overwrite the cursor version.

Resources