I am trying to migrate a working Dash app from the Heroku free platform to Google App Engine. The app works as expected both locally and on Heroku
The app loads on GAE
However the default query doesnt seem get parsed correctly on GAE, resulting in blank visualisations. If you select dropdown options the charts will load. It seems most likely to be a version conflict or a setting on GAE.
I have made the necessary changes to the app.yaml file as below:
runtime: python39
entrypoint: gunicorn -b :$PORT
My requirements.txt is as follow:
I have reviewed the callback but dont see a problem. I have included some code below for the callbacks, since I thought they may be the most relevant code. (raw_trees is just a loaded csv file)
# Set up callbacks/backend
Output("bar", "srcDoc"),
Output("timeline", "srcDoc"),
Output("diameter", "srcDoc"),
Output("density", "srcDoc"),
Output("map", "figure"),
Input("picker_date", "start_date"),
Input("picker_date", "end_date"),
Input("filter_neighbourhood", "value"),
Input("filter_cultivar", "value"),
Input("slider_diameter", "value"),
Input("map", "selectedData"),
def main_callback(
start_date, end_date, neighbourhood, cultivar, diameter_range, selectedData
# Build new dataset and call all charts
# Date input Cleanup
if start_date is None:
start_date = "2022-01-01"
if end_date is None:
end_date = "2022-05-30"
start_date = pd.Timestamp(date.fromisoformat(start_date))
end_date = pd.Timestamp(date.fromisoformat(end_date))
filtered_trees = raw_trees
# Filter by selection from big map
if selectedData is not None:
selectedTrees = []
if "points" in selectedData:
if selectedData["points"] is not None:
for point in selectedData["points"]:
# print(point)
# print(selectedTrees)
filtered_trees = filtered_trees[filtered_trees["TREE_ID"].isin(selectedTrees)]
# Filter by neighbourhood
if neighbourhood:
filtered_trees = filtered_trees[
# Filter by date
filtered_trees = filtered_trees[
(filtered_trees["BLOOM_START"] <= start_date)
& (filtered_trees["BLOOM_END"] >= start_date)
| (
(filtered_trees["BLOOM_START"] <= end_date)
& (filtered_trees["BLOOM_END"] >= end_date)
| (filtered_trees["BLOOM_START"].between(start_date, end_date))
| (filtered_trees["BLOOM_END"].between(start_date, end_date))
# Filter by Diameter
filtered_trees = filtered_trees[
filtered_trees["DIAMETER"].between(diameter_range[0], diameter_range[1])
if cultivar:
filtered_trees = filtered_trees[filtered_trees["CULTIVAR_NAME"].isin(cultivar)]
bar = bar_plot(filtered_trees)
timeline = timeline_plot(filtered_trees)
diameter = diameter_plot(filtered_trees)
density = density_map(filtered_trees)
big_map = street_map(filtered_trees)
return bar, timeline, diameter, density, big_map
Thanks for any help or insight, This is my first effort on GAE. Alternatively i would consider a more appropriate alternative for my deployment if anyone has a suggestion of that nature.
The entire project is here

After trying a second hosting solution (render) i had the same issue. I was able to solve it on render by increasing the worker timeout in the gunicorn like so:
web: gunicorn --timeout 1000
This is likely just due to the resource constraints on these free accounts, and this constraint is likely the issue on GAE as well, though the timeout flag didnt work in the entrypoint there.


How do I read the build-in Project.toml from a Pluto notebook?

I would like to instantiate the project.toml that's build in in a Pluto notebook with the native package manager. How do I read it from the notebook?
Say, I have a notebook, e.g.,
nb_source = ""
How can I create a temporary environment, and get the packages for the project of this notebook? In particular, how do I complete the following code?
import Pkg; Pkg.activate(".")
import Pluto, Pkg
nb = download(nb_source, ".")
### Some code using Pluto's build in package manager
### to read the Project.toml from nb --> nb_project_toml
cp(nb_project_toml, "./Project.toml", force=true)
So, first of all, the notebook you are looking at is a Pluto 0.17.0 notebook, which does not have the internal package manager. I think it was added in Pluto 0.19.0.
This is what the very last few cells look like in a notebook using the internal pluto packages:
# ╔═╡ 00000000-0000-0000-0000-000000000001
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
PlutoUI = "7f904dfe-b85e-4ff6-b463-dae2292396a8"
PyCall = "438e738f-606a-5dbb-bf0a-cddfbfd45ab0"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
Plots = "~1.32.0"
PlutoUI = "~0.7.40"
PyCall = "~1.94.1"
# ╔═╡ 00000000-0000-0000-0000-000000000002
# This file is machine-generated - editing it directly is not advised
julia_version = "1.8.0"
so you could add something like:
write("./Project.toml", PLUTO_PROJECT_TOML_CONTENTS)
This has the drawback of running all the code in your notebook, which might take a while.
Alternatively, you could read the notebook file until you find the # ╔═╡ 00000000-0000-0000-0000-000000000001 line and then either parse the following string yourself or eval everything after that (something like eval(Meta.parse(string_stuff_after_comment)) should do it...)
I hope that helps a little bit.
The Pluto.load_notebook_nobackup() reads the information of a notebook. This gives a dictionary of deps in the field .nbpkg_ctx.env.project.deps
import Pluto, Pkg
nb_source = ""
nb = download(nb_source)
nb_info = Pluto.load_notebook_nobackup(nb)
deps = nb_info.nbpkg_ctx.env.project.deps
Pkg.add([Pkg.PackageSpec(name=p, uuid=u) for (p, u) in deps])

Train an already trained model in Sagemaker and Huggingface without re-initialising

Let's say I have successfully trained a model on some training data for 10 epochs. How can I then access the very same model and train for a further 10 epochs?
In the docs it suggests "you need to specify a checkpoint output path through hyperparameters" --> how?
# define my estimator the standard way
huggingface_estimator = HuggingFace(
hyperparameters = hyperparameters,
# train the model
{'train': training_input_path, 'test': test_input_path}
If I run again it will just start the whole thing over again and overwrite my previous training.
You can find the relevant checkpoint save/load code in Spot Instances - Amazon SageMaker x Hugging Face Transformers.
(The example enables Spot instances, but you can use on-demand).
In hyperparameters you set: 'output_dir':'/opt/ml/checkpoints'.
You define a checkpoint_s3_uri in the Estimator (which is unique to the series of jobs you'll run).
You add code for to support checkpointing:
from transformers.trainer_utils import get_last_checkpoint
# check if checkpoint existing if so continue training
if get_last_checkpoint(args.output_dir) is not None:"***** continue training *****")
last_checkpoint = get_last_checkpoint(args.output_dir)

How do I embed code through my text editor?

I'm running Hugo and editing my pages using Notepad++. I'd like to embed some code similar to the page here.
My Hugo version is
Hugo Static Site Generator v0.55.6-A5D4C82D windows/amd64 BuildDate: 2019-05-18T07:57:00Z
My config.toml file is below. As you can see, I've added the pygments options to the top of the page:
pygmentsCodefences = true
pygmentsStyle = "autumn"
baseurl = ""
title = "Blake Shurtz"
theme = "hugo-creative-portfolio-theme"
languageCode = "en-us"
# Enable comments by entering your Disqus shortname
disqusShortname = ""
# Enable Google Analytics by entering your tracking code
googleAnalytics = ""
# Style options: default (pink), blue, green, pink, red, sea, violet
# Use custom.css for your custom styling
style = "default"
description = "Describe your website"
copyright = "©2019 Blake Shurtz"
sidebarAbout = [
"I am a research statistician who enjoys building models and apps.",
"Originally from the Bay Area, currently based in central CA."
# Contact page
# Since this template is static, the contact form uses as a
# proxy. The form makes a POST request to their servers to send the actual
# email. Visitors can send up to a 1000 emails each month for free.
# What you need to do for the setup?
# - set your email address under 'email' below
# - upload the generated site to your server
# - send a dummy email yourself to confirm your account
# - click the confirm link in the email from
# - you're done. Happy mailing!
email = ""
# Optional Matomo analytics (formerly piwik)
# []
# URL = ""
# ID = "42"
# # Track all subdomains with "*" (Optional)
# domain = ""
# # Optional integrity check hash
# hash = ""
# Nav links in the side bar
name = "Home"
url = "portfolio/"
home = true
name = "About"
url = "about/"
name = "Get in touch"
url = "contact/"
stackoverflow = ""
twitter = ""
email = ""
linkedin = ""
github = ""
Can someone give me an example of what I need to write in my text editor in order to include the code?
I'm assuming you mean using markdown syntax to format text as code.
Surround your code with three backticks at the beginning and at the end.
```python (or whatever language)
code here
As Ambrose Leung's answer mentions, you can include code blocks in markdown by wrapping them in 3 backticks:
some code here
To get syntax highlighting, you can use Chroma, which is built into Hugo. Just add these lines to the top of your config.toml file (don't let the names confuse you, they say pygments but are for chroma):
pygmentsCodefences = true
pygmentsStyle = "pygments"
You can set the pygmentsStyle value to any of the styles from the style gallery.

Zeppelin - pass variable from Spark to Markdown to generate dynamic narrative text

Is it possible to pass a variable from Spark interpreter (pyspark or sql) to Markdown? The requirement is to display a nicely formatted text (i.e. Markdown) such as "20 events occurred between 2017-01-01 and 2017-01-08" where the 20, 2017-01-01 and 2017-01-08 are dynamically populated based on output from other paragraphs.
Posting this for benefit of other users, this is what I have been able to find:
Markdown paragraphs can only contain static text.
But it is possible to achieve a dynamic formatted text output with the Angular interpreter instead.
(First paragraph)
// create data frame
val eventLogDF = ...
// register temp table for SQL access
eventLogDF.registerTempTable( "eventlog" )
val query = sql( "select max(Date), min(Date), count(*) from eventlog" ).take(1)(0)
val maxDate = query(0).toString()
val minDate = query(1).toString()
val evCount = query(2).toString()
// bind variables which can be accessed from angular interpreter
z.angularBind( "maxDate", maxDate )
z.angularBind( "minDate", minDate )
z.angularBind( "evCount", evCount )
(Second paragaph)
<div>There were <b>{{evCount}} events</b> between <b>{{minDate}}</b> and <b>{{maxDate}}</b>.</div>
You could also print out markdown by translate it into HTML first, for those who may already have an markdown template for output, or your Zeppelin environment have no Angular interpreter(e.g. a K8s deployment).
First, install markdown2.
pip install markdown2
And use it.
import markdown2
# prepare your markdown string
markdown_string = template_mymarkdown.format(**locals())
# use Zeppelin %html for output
print("%html", markdown2.markdown(markdown_string, extras=["tables"]))
A screenshot for example:

R tm: reloading a 'PCorpus' backend filehash database as corpus (e.g. in restarted session/script)

Having learned loads from answers on this site (thanks!), it's finally time to ask my own question.
I'm using R (tm and lsa packages) to create, clean and simplify, and then run LSA (latent semantic analysis) on, a corpus of about 15,000 text documents. I'm doing this in R 3.0.0 under Mac OS X 10.6.
For efficiency (and to cope with having too little RAM), I've been trying to use either the 'PCorpus' (backend database support supported by the 'filehash' package) option in tm, or the newer 'tm.plugin.dc' option for so-called 'distributed' corpus processing). But I don't really understand how either one works under the bonnet.
An apparent bug using DCorpus with tm_map (not relevant right now) led me to do some of the preprocessing work with the PCorpus option instead. And it takes hours. So I use R CMD BATCH to run a script doing things like:
> # load corpus from predefined directory path,
> # and create backend database to support processing:
> bigCcorp = PCorpus(bigCdir, readerControl = list(load=FALSE), dbControl = list(useDb = TRUE, dbName = "bigCdb", dbType = "DB1"))
> # converting to lower case:
> bigCcorp = tm_map(bigCcorp, tolower)
> # removing stopwords:
> stoppedCcorp = tm_map(bigCcorp, removeWords, stoplist)
Now, supposing my script crashes soon after this point, or I just forget to export the corpus in some other form, and then I restart R. The database is still there on my hard drive, full of nicely tidied-up data. Surely I can reload it back into the new R session, to carry on with the corpus processing, instead of starting all over again?
It feels like a noodle question... but no amount of dbInit() or dbLoad() or variations on the 'PCorpus()' function seem to work. Does anyone know the correct incantation?
I've scoured all the related documentation, and every paper and web forum I can find, but total blank - nobody seems to have done it. Or have I missed it?
The original question was from 2013. Meanwhile, in Feb 2015, a duplicate, or similar question, has been answered:
How to reconnect to the PCorpus in the R tm package?. That answer in that post is essential, although pretty minimalist, so I'll try to augment it here.
These are some comments I've just discovered while working on a similar problem:
Note that the dbInit() function is not part of the tm package.
First you need to install the filehash package, which the tm-Documentation only "suggests" to install. This means it is not a hard dependency of tm.
Supposedly, you can also use the filehashSQLite package with library("filehashSQLite") instead of library("filehash"), and both of these packages have the same interface and work seamlesslessly together, due to object-oriented design. So also install "filehashSQLite" (edit 2016: some functions such as tn::content_transformer() are not implemented for filehashSQLite).
then this works:
# this string becomes filename, must not contain dots.
# Example: "mydata.sqlite" is not permitted.
s <- "sqldb_pcorpus_mydata" #replace mydat with something more descriptive
if(! file.exists(s)){
# csv is a data frame of 900 documents, 18 cols/features
pc = PCorpus(DataframeSource(csv), readerControl = list(language = "en"), dbControl = list(dbName = s, dbType = "SQLite"))
dbCreate(s, "SQLite")
db <- dbInit(s, "SQLite")
# add another record, just to show we can.
# key="test", value = "Hi there"
dbInsert(db, "test", "hi there")
} else {
db <- dbInit(s, "SQLite")
pc <- dbLoad(db)
# <<PCorpus>>
# Metadata: corpus specific: 0, document level (indexed): 0
#Content: documents: 900
dbFetch(db, "test")
# remove it
#reload it
db <- dbInit(s, "SQLite")
pc <- dbLoad(db)
# the corpus entries are now accessible, but not loaded into memory.
# now 900 documents are bound via "Active Bindings", created by makeActiveBinding() from the base package
# [1] "1" "2" "3" "4" "5" "6" "7" "8" "9"
# ...
# [900]
#[883] "883" "884" "885" "886" "887" "888" "889" "890" "891" "892"
#"893" "894" "895" "896" "897" "898" "899" "900"
#[901] "test"
dbFetch(db, "900")
# <<PlainTextDocument>>
# Metadata: 7
# Content: chars: 33
dbFetch(db, "test")
#[1] "hi there"
This is what the database backend looks like. You can see that the documents from the data frame have been encoded somehow, inside the sqlite table.
This is what my RStudio IDE shows me:
