Spacy on AppEngine standard - google-app-engine

I'm trying to use Spacy on the new AppEngine Standard Python 3.7 runtime.
When I try to deploy I get:
ERROR: (gcloud.app.deploy) Cannot upload file
[/my/project/path/venv/lib/python3.7/site-packages/spacy/lang/tr/lemmatizer.py],
which has size [41523943] (greater than maximum allowed size of
[33554432]). Please delete the file or add to the skip_files entry in
your application .yaml file and try again.
A few oddities:
The docs seem to indicate that I don't need to upload the virtual environment and it will be created from requirements.txt
Looking at the log file, it seems to ignore .pyc files, but not the venv directory
The error message says to add to the skip_files in your application .yaml file and try again., but the docs say the python3.7 runtime doesn't use skip files and to use a .gcloudignore file instead, but adding venv/ or venv/* doesn't work (it appears to be ignored)

To fix this, I needed up update gcloud and reauthenticate:
gcloud components update
gcloud auth login

Related

Unable to provide custom app.yaml file for Java GCP App Engine project

I have a Java 11 GCP App Engine project and I'm trying to use different app.yaml files depending on the environment (e.g., app-dev.yaml, app-prod.yaml, etc). The yaml files in the /appengine directory like src/main/appengine/app-dev.yaml and so on.
There is an SO post about this already, but the answer doesn't work because it clobbers the descriptor which in Java should be the pom.xml (see my Approach 2 for more information).
Approach #1
UPDATE: Solved! In order to use this approach you must be on gcloud 298.0.0+
First, I tried using the --appyaml=APPYAML argument found in https://cloud.google.com/sdk/gcloud/reference/app/deploy#--appyaml:
gcloud app deploy [DEPLOYABLES …] [--appyaml=APPYAML] [--bucket=BUCKET] ...
I ran the following and received an error that the appyaml argument isn't recognized.
$ gcloud --project=my-project app deploy --appyaml=app-dev.yaml
ERROR: (gcloud.app.deploy) unrecognized arguments: --appyaml=app-dev.yaml
The fully qualified path to app-dev.yaml doesn't work either.
Approach #2
Next I found a slighly different syntax in https://cloud.google.com/appengine/docs/flexible/java/configuring-your-app-with-app-yaml that looks like this:
gcloud app deploy service-name-app.yaml
I tried the same locally but pointed to my custom app-dev.yaml like so, but it breaks:
$ gcloud --project=my-project app deploy src/main/appengine/app-dev.yaml
...
descriptor: [/Users/SomeDev/IdeaProjects/my-project/app-server/src/main/appengine/app-dev.yaml]
source: [/Users/SomeDev/IdeaProjects/my-project/app-server/src/main/appengine]
target project: [my-project]
target service: [default]
target version: [20200831abcdefg]
target url: [https://my-project.uc.r.appspot.com]
This breaks because it thinks the app-dev.yaml is the descriptor file instead of a pom.xml, so it errors out with the following:
Error message: did not find any jar files with a Main-Class manifest entry
To compare, I ran a normal deployment without a custom yaml file and you can see the pom.xml is the value of the descriptor.
$ gcloud --project=my-project app deploy
...
descriptor: [/Users/SomeDev/IdeaProjects/my-project/app-server/pom.xml]
source: [/Users/SomeDev/IdeaProjects/my-project/app-server]
target project: [my-project]
target service: [default]
target version: [20200831abcdefg]
target url: [https://my-project.uc.r.appspot.com]
Is there a recommended way to make this work, or is this the wrong approach entirely?
Looking at your "Approach #1" you have to upgrade your gcloud to version >= 298.0.0 here --appyaml parameter have been added - quite recently in Jun'20.
Looking at your "Approach #2". If you are running gcloud app deploy (without parameters) it search for descriptor app.yaml in current directory and if not found - than for pom.xml. If you want to use pom.xml from different localization you have to remove it from current directory. I didn't test it to the very end, just tested the descriptor value in summary.
Anyway I don't think using above is best way to do it. When you use pom.xml as descriptor it means that you are using feature called "deploy your Maven project as source code". Which is not main way to deploy app engine with maven.
According to my understanding if maven was used for build, its possible to use the jar in entrypoint of app.yaml file (reference) or maven goal appengine:deploy (reference + article that should be interesting).

How do I run dev_appserver.py from within my feature file in behave python?

dev_appserver.py starts a local deployement of my appengine service. I want to run my tests on behave on this local service. I want to start the server within my tests first. How to run the dev_appsrrver.py app.yaml command in my behave feature file in the start ?
I have tried subprocess.run("python","dev_appserver.py") but it says couldnt find the file dev_appserver.py. I'm trying on windows.
When you're attempting to launch executables using subprocess methods typically you're not getting by default the same environment (execution path and current working directory) you're getting yourself in a shell/terminal. Which means you may need to reference files (both executables and regular files) using full paths in the list of arguments you pass to those methods.
Since the subprocess.run() execution complaints about the dev_appserver.py location it means it's finding python OK (you may still want to check that's it's the 2.7 version) and you need to provide the full path for dev_appserver.py, which depends on your OS and the SDK you use. On Linux, for example (sorry, I'm not a windows guy), the path is:
<GAE_SDK_dir>/dev_appserver.py if using the GAE SDK
<gcloud_SDK_dir>/bin/dev_appserver.py if using the gcloud SDK
You'll most likely need to pass the path to your GAE app's app.yaml file, too - as an argument to dev_appserver.py, otherwise you'll see it complain about inability to locate the app or its files (or just having things run badly - if the app.yaml file isn't specified dev_appserver.py attempt to auto-detect it and that doesn't work in all cases). I'd avoid complications and just specify the app.yaml file(s).
Also note that the subprocess.run() args should be a list. Something like this:
subprocess.run(['python', '<sdk_path_to>/dev_appserver.py', '<app_path_to>/app.yaml'])
See also appcfg.py not working in command line - the post is about a different executable, but the answers are equally applicable to dev_appserver.py.
Quoting the App Engine documentation:
To start the local development server:
Run the dev_appserver.py command as follows from the directory that
contains your app's app.yaml configuration file:
Specify the directory
path to your app, for example:
dev_appserver.py [PATH_TO_YOUR_APP].
Alternatively, you can specify the
configuration file of a specific service, for example:
dev_appserver.py app.yaml.
To change the port, you include the --port
option:
dev_appserver.py --port=9999 [PATH_TO_YOUR_APP]

bad import "syscall" for cloud storage APIs

I am following the instructions on https://cloud.google.com/appengine/docs/go/googlecloudstorageclient/download to begin migrating some code from the, now deprecated, Files API to the new Cloud Storage API without success.
The steps I'm following are ...
I'm running appengine v1.9.23 which is later than the required appengine v1.8.1.
My $GOPATH is set, so I skip step #1.
I proceed to step #2:
goapp get -u golang.org/x/oauth2
goapp get -u google.golang.org/cloud/storage
I am not developing on a managed VM, so I skip step #3.
Now when I run the application, I get:
go-app-builder: Failed parsing input: parser: bad import "syscall" in goapp/src/golang.org/x/net/internal/nettest/error_posix.go
What am I doing wrong?
Steps to reproduce
Download an install the Google Appengine runtime, version 1.9.23 from https://console.cloud.google.com/storage/browser/appengine-sdks/featured/ . Follow the installation instructions documented on https://cloud.google.com/appengine/downloads?hl=en
Create an appengine project directory:
% mkdir $HOME/myapp
Create a new app.yaml file as ~/myapp/app.yaml. Read the directions on the Google website for details: https://cloud.google.com/appengine/docs/go/config/appconfig
I use a version that does not have the static resources:
application: myapp
version: alpha-001
runtime: go
api_version: go1
handlers:
- url: /.*
script: _go_app
Create a location for the Go source files.
% mkdir $HOME/myapp/go
Set your GOPATH to the location of your sources
% export GOPATH=$HOME/myapp/go
Get the Go appengine example project: https://github.com/golang/example
% goapp get github.com/golang/example/appengine-hello
This command will download the example app to the first path entry in the GOPATH
Install the Google Cloud Storage client libraries as directed in https://cloud.google.com/appengine/docs/go/googlecloudstorageclient/download . Reference the steps at the top of this question for more details. Following the directions should result in you running 2 commands:
% go get -u golang.org/x/oauth2
% go get -u google.golang.org/cloud/storage
Attempt to run your go application
% goapp serve
You will see the following compilation error (no stack trace):
2015/12/23 10:37:07 go-app-builder: Failed parsing input: parser: bad import "syscall" in go/src/golang.org/x/net/ipv6/control_unix.go
This error is caused by either of two scenarios:
1) Implicitly importing syscall by importing another package that uses it, as referenced in this related question.
2) Having your package source files in your GOPATH located in a directory at or below the same level as your project's app.yaml (eg. app.yaml in ~/go, and packages sources in ~/go/gopath/src). If a package like x/net/internal/nettest exists in your GOPATH the syscall import will be parsed by goapp at compile time and throw the compilation error.
Avoiding these two scenarios should be sufficient to prevent any bad import "syscall" errors or related compilation errors.
Reproduced the initial steps above and got a similar error, even if not explicitly mentioning syscall. However, running “goapp serve” in the appengine-hello directory results in no error at all.
Adam’s explanation at point 2 applies here correctly: one needs to place the app.yaml file at the right level in the directory structure.
sirupsen/logrus references syscall.
They have an appengine tag specified, not to include syscall so it's usable in AppEngine, something like go build -tags appengine as per issue 310.
However I haven't yet succeeded including it in an AppEngine project so that this build param could be forwarded and specified somewhere so that it goes through. I'll come back to update if I manage.

Google App Engine remote_api script not found

I'm setting up remote_api locally and this time around it's not working. I'm just following the instructions on the remote_api doc page for python here: http://code.google.com/appengine/articles/remote_api.html
Which basically means I'm running the following command from the project app root (that contains app.yaml)
>> python $GAE_SDK_ROOT/remote_api_shell.py -s your_app_id.appspot.com
>> /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python: can't open file '/remote_api_shell.py': [Errno 2] No such file or directory
The your_app_id param is being changed to the actual app id on google's.
It's probably a simple thing, but not sure what it is.
You need to replace $GAE_SDK_ROOT with the actual root directory of your GAE SDK. So, probably:
python ~/google_appengine/remote_api_shell.py -s ...

How do I change the default location of the log files for GAE's bulkloader?

While working on my GAE project under my dev environment, whenever I upload data to my dev datastore, the logfiles are stored in my current directory, for instance:
C:\dev\ls
bulkloader-log-20090912.104643
bulkloader-log-20090912.104648
bulkloader-log-20090912.104731
bulkloader-log-20090912.105526
bulkloader-log-20090912.110428
bulkloader-progress-20090912.104648.sql3
bulkloader-progress-20090912.104731.sql3
bulkloader-progress-20090912.105526.sql3
bulkloader-progress-20090912.110428.sql3
project
project is my GAE app. The above is generated when I run the command appcfg.py upload_data. Is there a way to tell GAE where to store those log files, for instance in a log folder.
Use the --log_file=... option to appcfg.py, as documented here: with this command line option you can give the complete path to the log file, including folder and name. (You cannot give JUST the folder and let it figure out the name; for that, you need to write a tiny script that figures out the name then calls appcfg.py).

Resources