Issue deploying to AppEngine (Flex) from Cloud Build - google-app-engine

I'm facing issue about deploying to AppEngine Flex Environment via Cloud Build. It used to work fine but not today. here's trace shown on Cloud Build's logs.
... (skipped composer's things) ...
Step #1 Updating service [legacy-api] (this may take several minutes)...
Step #1: ..............................................................................................................................................................................................................................................................................................failed.
Step #1: ERROR: (gcloud.app.deploy) Error Response: [13] Flex operation projects/MY-PROJECT/regions/asia-east2/operations/... error [INTERNAL]: An internal error occurred while processing task /appengine-flex-v1/insert_flex_deployment/flex_create_resources>2019-11-27T07:24:52.924Z46964.jo.8: Deployment Manager operation …/operation-… errors: [code: "RESOURCE_ERROR"
Step #1: location: "/deployments/…/resources/…-00it"
Step #1: message: "{\"ResourceType\":\"compute.v1.instanceTemplate\",\"ResourceErrorCode\":\"400\",\"ResourceErrorMessage\":{\"code\":400,\"errors\":[{\"domain\":\"global\",\"message\":\"Invalid value for field \'resource.properties.labels\': \'\'. Label value \'Infinity\' violates format constraints. The value can only contain lowercase letters, numeric characters, underscores and dashes. The value can be at most 63 characters long. International characters are allowed.\",\"reason\":\"invalid\"}],\"message\":\"Invalid value for field \'resource.properties.labels\': \'\'. Label value \'Infinity\' violates format constraints. The value can only contain lowercase letters, numeric characters, underscores and dashes. The value can be at most 63 characters long. International characters are allowed.\",\"statusMessage\":\"Bad Request\",\"requestPath\":\"https://compute.googleapis.com/compute/v1/projects/.../global/instanceTemplates\",\"httpMethod\":\"POST\"}}"
Step #1: ]
Here's my cloudbuild.yaml
steps:
- name: 'gcr.io/$PROJECT_ID/secrets:latest'
entrypoint: sh
args:
- "-c"
- |
cat /secrets/$_ENV/environments/${_SERVICE_NAME}.env > .env
- name: 'gcr.io/cloud-builders/gcloud'
args: ["app", "deploy", "--version=$SHORT_SHA", "--promote", "--stop-previous-version", "./app.yaml"]
timeout: 1200s
substitutions:
_ENV: staging
_SERVICE_NAME: legacy-api
here's my app.yaml
service: legacy-api
runtime: php
env: flex
runtime_config:
document_root: public
enable_stackdriver_integration: true
resources:
cpu: 4
memory_gb: 8
beta_settings:
cloud_sql_instances: ${CLOUD_SQL_INSTANCE}
network:
name: default
now, this blocked my development processes. please help.
Thanks in advance!
edit 2019-11-27 23:00 GMT+0700 (12hours after problem first seen)
things get worse, i have no idea what to do. then, i tried to change deployment destination to Standard Environment. and this is what I get.
Starting Step #1
Step #1: Already have image (with digest): gcr.io/cloud-builders/gcloud
Step #1: Services to deploy:
Step #1:
Step #1: descriptor: [/workspace/app.yaml]
Step #1: source: [/workspace]
Step #1: target project: [....]
Step #1: target service: [legacy-api-std]
Step #1: target version: [201911272240]
Step #1: target url: [https://legacy-api-std-dot-....appspot.com]
Step #1:
Step #1:
Step #1: Do you want to continue (Y/n)?
Step #1: Beginning deployment of service [legacy-api-std]...
Step #1: Created .gcloudignore file. See `gcloud topic gcloudignore` for details.
Step #1: #============================================================#
Step #1: #= Uploading 1 file to Google Cloud Storage =#
Step #1: #============================================================#
Step #1: File upload done.
Step #1: ERROR: (gcloud.app.deploy) INVALID_ARGUMENT: This deployment has too many files. New versions are limited to 10000 files for this app.
Step #1: - '#type': type.googleapis.com/google.rpc.BadRequest
Step #1: fieldViolations:
Step #1: - description: This deployment has too many files. New versions are limited to 10000
Step #1: files for this app.
Step #1: field: version.deployment.files[...]
Finished Step #1
ERROR
ERROR: build step 1 "gcr.io/cloud-builders/gcloud" failed: exit status 1
update 2019-11-28 08:40:00+0700 (almost 24 hours since the problem first seen
cases opened in Google Cloud Support 8 hours ago
still no miracle here

I have been able to find out that the issue is known in GCP and the team is currently working on it.
The error is due to the fact that the SHORT_SHA that you are using, is matching one of the regexs used by the internal Deployment Manager YAML parser. At the moment, this known issue is caused by passing a SHA string that the Deployment Manager YAML parser will evaluate as a float rather than a string, passing a timestamp as a value also causes the parser to incorrectly evaluate the string as a timestamp.
Then adding a random string at the end of the SHORT_SHA will force it to be a string and therefore avoid the issue:
"${SHORT_SHA}xyz"
Regex that is being used for floating point numbers can be found here.
I have also created this PIT to keep track of the engineer's investigation. Further information about it will be shared in this thread.

Related

App deploy fails on GCP App Engine (GAE) flexible when specifying engines.node version

gcloud --project my-project-id app deploy app.yaml
fails on GAE Flex when I specify a engines.node version in package.json
but succeeds when I remove it from the package.json, which is contradictory to the https://cloud.google.com/appengine/docs/flexible/nodejs/runtime documentation.
I have specified
"engines": {
"node": "16.x"
}
in my package.json
The error I ultimately get is:
ERROR: gcloud crashed (IOError): [Errno 11] Resource temporarily unavailable
It seems this error is transient as on vary rare occasions in the past 24 hours, the app deploy succeeded.
Google folks, can you confirm the problem is on your side ?
I am deploying on GAE Flex in asia-southeast2-a (Jakarta)
Thanks
Below a more complete log trace
BUILD
Starting Step #0
Step #0: Pulling image: gcr.io/gcp-runtimes/nodejs/gen-dockerfile#sha256:770f37e7042652138c7dac203fc35ef0218002515ddd9f311db1c6c54d6816aa
Step #0: gcr.io/gcp-runtimes/nodejs/gen-dockerfile#sha256:770f37e7042652138c7dac203fc35ef0218002515ddd9f311db1c6c54d6816aa: Pulling from gcp-runtimes/nodejs/gen-dockerfile
Step #0: Digest: sha256:770f37e7042652138c7dac203fc35ef0218002515ddd9f311db1c6c54d6816aa
Step #0: Status: Downloaded newer image for gcr.io/gcp-runtimes/nodejs/gen-dockerfile#sha256:770f37e7042652138c7dac203fc35ef0218002515ddd9f311db1c6c54d6816aa
Step #0: gcr.io/gcp-runtimes/nodejs/gen-dockerfile#sha256:770f37e7042652138c7dac203fc35ef0218002515ddd9f311db1c6c54d6816aa
Step #0: Checking for Node.js.
Finished Step #0
Starting Step #1
Step #1: Already have image (with digest): gcr.io/kaniko-project/executor#sha256:f87c11770a4d3ed33436508d206c584812cd656e6ed08eda1cff5c1ee44f5870
Step #1: INFO[0000] Removing ignored files from build context: [node_modules .dockerignore Dockerfile npm-debug.log yarn-error.log .git .hg .svn app.yaml]
Step #1: INFO[0000] Downloading base image gcr.io/google-appengine/nodejs#sha256:721ce182842495a610589261688f1abc1801c915a7e24880fa15af9e9d725459
Step #1: INFO[0017] Taking snapshot of full filesystem...
Step #1: INFO[0026] Using files from context: [/workspace]
Step #1: INFO[0026] COPY . /app/
Step #1: INFO[0026] Taking snapshot of files...
Step #1: INFO[0026] RUN /usr/local/bin/install_node '16.x'
Step #1: INFO[0026] cmd: /bin/sh
Step #1: INFO[0026] args: [-c /usr/local/bin/install_node '16.x']
Step #1: % Total % Received % Xferd Average Speed Time Time Time Current
Step #1: Dload Upload Total Spent Left Speed
100 30.8M 100 30.8M 0 0 68.9M 0 --:--:-- --:--:-- --:--:-- 68.9M
Step #1: % Total % Received % Xferd Average Speed Time Time Time Current
Step #1: Dload Upload Total Spent Left Speed
100 4035 100 4035 0 0 35238 0 --:--:-- --:--:-- --:--:-- 35394
Step #1: node-v16.13.0-linux-x64.tar.gz: OK
Step #1: Installed Node.js v16.13.0
Step #1: INFO[0028] Taking snapshot of full filesystem...
Step #1: INFO[0030] Adding whiteout for /nodejs/lib/node_modules/npm/node_modules/npm-install-checks/CHANGELOG.md
Step #1: INFO[0030] Adding whiteout for /nodejs/lib/node_modules/npm/node_modules/os-tmpdir
Step #1: INFO[0030] Adding whiteout for /nodejs/lib/node_modules/npm/node_modules/npmlog/CHANGELOG.md
Step #1: INFO[0030] Adding whiteout for /nodejs/lib/node_modules/npm/node_modules/lodash.uniq
(...)
ERROR: gcloud crashed (IOError): [Errno 11] Resource temporarily unavailable
Thanks
[Update] I have upgrade the gcloud-sdk and the error message is a little different now, I get a
[Errno 11] write could not complete without blocking
in the middle of the long logging stream generated by the gcloud app deploy command.
Following the advise of other developers (related to python lib issues, link I found a workaround by adding the --no-user-output-enabled flag to the gcloud app deploy command. This prevents the error from happening.
But this is a temp workaround and I hope the GCP team will fix the issue in the CLI library.

Why does redirecting stderr make my App Engine deployment failing consistently?

I am deploying a trivial App Engine Standard Environment app. (Literally the shortest possible, a Python 3 "hello world".) I am using Macbook with zshell.
If I redirect standard error to file, I get an error (below) every time.
gcloud app deploy -q 2>>err.log
If I omit the redirection, it succeeds every time.
There is no difference between using > or >>. Redirecting with a pipe, e.g. to grep, does not cause the problem.
So this is a "solution" (by sending output through a passthrough grep) that does what I need and does not trigger the problem, but this is very roundabout.
gcloud app deploy -q 2>&1 >/dev/null |egrep "." >> err.txt
Note that I use -q, so waiting for my Y for approval is not the issue.
The error is this. (Identifiers were anonymized.)
..................failed.
ERROR: (gcloud.app.deploy) Error Response: [9] Cloud build BUILD_ID status: FAILURE
Build error details: Failed to download at least one file. Cannot continue.
Full build logs: https://console.cloud.google.com/cloud-build/builds/BUILD_ID?project=PROJECT_ID
Looking at the logs, I see this.
starting build "BUILD_ID"
FETCHSOURCE
BUILD
Starting Step #0 - "fetcher"
Step #0 - "fetcher": Already have image (with digest): gcr.io/cloud-builders/gcs-fetcher
Step #0 - "fetcher": Fetching manifest gs://staging.joshua-playground.appspot.com/ae/BUILD_ID/manifest.json.
Step #0 - "fetcher": Processing 728 files.
Step #0 - "fetcher": Failed to fetch gs://staging.my-project.appspot.com/BUILD_ID, will no longer retry: fetching "gs://staging.my-project.appspot.com/BUILD_ID" with timeout 1h0m0s to temp file "/workspace/.download/staging.joshua-playground.appspot.com-BUILD_ID": err SHA mismatch, got "SHA_VALUE", want "SHA_VALUE"
Step #0 - "fetcher": Failed to download at least one file. Cannot continue.
Finished Step #0 - "fetcher"
ERROR
ERROR: build step 0 "gcr.io/cloud-builders/gcs-fetcher" failed: step exited with non-zero status: 1
Google has confirmed this issue. Please track it here.

Google Cloud Build - source-context.json SHA mismatch

I have a Python 3 project which I am hosting on Google AppEngine Standard. Until a couple of days ago I was able to deploy normally (right since I did the initial setup in July 2019) until a couple of days ago. Now I get the following response:
starting build "abc"
FETCHSOURCE
BUILD
Starting Step #0 - "fetcher"
Step #0 - "fetcher": Already have image (with digest): gcr.io/cloud-builders/gcs-fetcher
Step #0 - "fetcher": Fetching manifest gs://staging.my-project.appspot.com/ae/xxx/manifest.json.
Step #0 - "fetcher": Processing 312 files.
Step #0 - "fetcher": Failed to fetch gs://staging.my-project.appspot.com/xxx, will no longer retry: fetching "gs://staging.my-project.appspot.com/xxx" with timeout 1h0m0s to temp file "/workspace/.download/staging.my-project.appspot.com-xxx": source-context.json SHA mismatch, got "xxx", want "yyy"
Step #0 - "fetcher": Failed to download at least one file. Cannot continue.
Finished Step #0 - "fetcher"
ERROR
ERROR: build step 0 "gcr.io/cloud-builders/gcs-fetcher" failed: step exited with non-zero status: 1
Any idea why this would be happening and how to fix it?
P.S. I use the following command for deployment:
gcloud --project my-project app deploy app.yaml
After conversation with Google engineers (https://issuetracker.google.com/issues/154588981?pli=1) the following worked:
Remove the source-context.json file
Delete the bucket where the deployment files are, e.g. gs://staging.my-project.appspot.com
Deploy again
If you need the source-context.json file, you can follow these steps: https://www.google.com/url?q=https://cloud.google.com/debugger/docs/source-context&sa=D&usg=AFQjCNHMB7Dm_jISwG2AnpokQ7XN5GmLAw
This is what I did and it worked for me:
1.
Go to the link that in the error description:
2.
You'll see there some lines for staging fail, copy the name of the file in the first line:
3.
Go to the Google storage here and enter to a bucket with name that starts with "staging.***"
And inside that bucket I search for the string I copy in step 2:
4.
Deleted that file, and retry that steps for every line you'll see in the error details link (in my example there are 4 rows).
5.
Deploy again!
I had a similar problem after I changed a local file during a gcloud app deploy with a long upload (2200 files, because of a wrong .gcloudignore).
I fixed it by deleting the changed file in the cloud storage browser, and deploying again with gcloud app deploy.
For anyone using a NextJS project the solution may be to delete the .next folder and rebuild your project. It seems the some of the cache files from next can become out of sync.
From the root of your project:
rm -rf .next && next build
then redeploy as usual

GAE deployment failing

I'm using GAE for a Laravel PHP site and using flex instances. I've always had no problem doing a "gcloud app deploy" to get my app deployed. However, for the last 24 hours or so when I attempt to deploy I get the following error:
Step #1: Package manifest generated successfully.
Step #1: > chmod -R 755 bootstrap/cache
Step #1: > php artisan cache:clear
Step #1:
Step #1: In AbstractConnection.php line 155:
Step #1:
Step #1: Connection timed out [tcp://1.2.3.4:6379]
Step #1:
Step #1:
Step #1: Script php artisan cache:clear handling the post-install-cmd event returned with error code 1
Step #1: The command '/bin/sh -c /build-scripts/composer.sh' returned a non-zero code: 1
Finished Step #1
ERROR
ERROR: build step 1 "gcr.io/cloud-builders/docker#sha256:12345" failed: exit status 1
Step #1:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ERROR: (gcloud.app.deploy) Cloud build failed. Check logs at https://console.cloud.google.com/gcr/builds/12345?project=1234 Failure status: UNKNOWN: Error Response: [2] Build failed; check build logs for details
I have a Memorystore (redis) instance I use since GAE memcache isn't available on flex instances yet. My app uses redis as a cache, so as you can see above, once the new code is deployed composer is configured to clear the cache, which is where it's timing out and failing.
If I SSH into an existing instance, I can run php artisan cache:clear no problem. However, it's failing on deploy. It's a pretty simple code change that's only UI tweaks (html/javascript) so none of the redis or connection code has changed.
Any ideas?

Cannot find package when deploy to App Engine

I can not deploy my golang application (with echo framework) on App Engine.
I have some kind of error like:
...
Step #0: main.go:4:2: cannot find package "FBackend/router" in any of:
Step #0: /usr/local/go/src/FBackend/router (from $GOROOT)
Step #0: /workspace/_gopath/src/FBackend/router (from $GOPATH)
Finished
Step #0 ERROR
ERROR: build step 0 "..." failed: exit status 1
In project I have file three like this:
FBackend
...
|___router
| |____router.go
...
|
|___main.go
On localhost all works fine
Judging from the error message, you should set one of the paths to the "src" folder.
They should be able to find "FBackend/router" in "/usr/local/go/src/" since it's there.
I can understand how it can't find "FBackend/router" in "/usr/local/go/src/FBackend/router" cause you probably don't have "/usr/local/go/src/FBackend/router/FBackend/router" setup.

Resources