Thanos Store Error bucket store initial sync: sync block: iter: The specified key does not exist." - thanos

I am new to Thanos and Prometheus. I was trying to set up the Thanos components in our K8.
It is freshly set up using S3 as storage object.
Used image versions:
Prometheus version: v2.4.3
Thanos version: v0.8.0
Thanos sidecar is working with the config of s3 storage. However, when deploying thanos store I am hitting this error caller=main.go:200 err="store command failed: bucket store initial sync: sync block: iter: The specified key does not exist.
There is no data yet shipped to S3, as I need to wait for 2 hours (default).

I changed the bucket endpoint to "s3.amazonaws.com", then all other thanos components has worked properly. Reference: https://github.com/thanos-io/thanos/issues/2777

Related

Running Google App Engine Deployment to an other Project trough CloudBuild

I'm having a Project called "RnD" (with the ID: 1111111) in the Google Cloud where all Repositories and the CloudBuild Triggers are.
Now i want to run a CloudBuild Trigger in the "RnD" Project which then Deploys to the App Engine in Project "X" (with the ID: 99999999). I gave the CloudBuild service Account in the "RnD" Project the following permission in Project "X":
App Engine Admin
Service Account User
Project Browser
in the RnD Project App Engine is active and configured. On the RnD Project not since its not used there.
and this is my cloudbuild.yaml file:
steps:
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
dir: 'api'
entrypoint: 'bash'
args: ['-c', 'gcloud config set project ${_TARGET_PROJECT_NAME} && gcloud config set app/cloud_build_timeout 1600 && gcloud app deploy ']
timeout: '1600s'
_TARGET_PROJECT_NAME is a Substitution configured on the Trigger and the value is the name of the Project "X".
Running an build returns the following logs.
starting build "xxxxxxxxxx"
FETCHSOURCE
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint: git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint: git branch -m <name>
Initialized empty Git repository in /workspace/.git/
From https://source.developers.google.com/p/rnd/r/my_reponame
* branch xxxxxxxxxxxx -> FETCH_HEAD
HEAD is now at xxxxxx
BUILD
Pulling image: gcr.io/google.com/cloudsdktool/cloud-sdk
Using default tag: latest
latest: Pulling from google.com/cloudsdktool/cloud-sdk
0bc3020d05f1: Already exists
a5178f1195d4: Pulling fs layer
... blah blah
cc6c9aaa8146: Pull complete
Digest: sha256:xxxxxxxxx
Status: Downloaded newer image for gcr.io/google.com/cloudsdktool/cloud-sdk:latest
gcr.io/google.com/cloudsdktool/cloud-sdk:latest
Updated property [core/project].
WARNING: You do not appear to have access to project [X] or it does not exist.
Updated property [app/cloud_build_timeout].
API [appengine.googleapis.com] not enabled on project [1111111].
Would you like to enable and retry (this will take a few minutes)?
(y/N)?
ERROR: (gcloud.app.deploy) User [1111111#cloudbuild.gserviceaccount.com] does not have permission to access apps instance [X] (or it may not exist): App Engine Admin API has not been used in project 1111111 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/appengine.googleapis.com/overview?project= 1111111 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.
- '#type': type.googleapis.com/google.rpc.Help
links:
- description: Google developers console API activation
url: https://console.developers.google.com/apis/api/appengine.googleapis.com/overview?project= 1111111
- '#type': type.googleapis.com/google.rpc.ErrorInfo
domain: googleapis.com
metadata:
consumer: projects/1111111
service: appengine.googleapis.com
reason: SERVICE_DISABLED
ERROR
ERROR: build step 0 "gcr.io/google.com/cloudsdktool/cloud-sdk" failed: step exited with non-zero status: 1
Looks like i had to activate the "App Engine" on the RnD Project too. Which somehow makes sense the more i think about it.
In addition to that i had to give the Cloud Build Service Account in the Project "X" more permission. I did not yet figure out the minimum permission set for this Service Account. It works if i'm giving the service Account Project Owner rights (which i shouldn't i know ;) ).

Gcloud cloud build local component failing with error "Error loading config file: unknown field "availableSecrets" in cloudbuild.Build"

Greetings stackoverflow community! First time asker, long time user.
I am testing out my cloudbuild.yaml file locally using Cloud Build Local component and Secret Manager and it is failing on "availableSecrets".
Error message: Error loading config file: unknown field "availableSecrets" in cloudbuild.Build
OS Platform: Windows 10/WSL2/Ubuntu 18.04
cloud-build-local: v0.5.2
Docker engine: v20.10.2
Nodejs version: v14.15.3
NPM version: 6.14.9
gcloud version: 326.0.0
Installed components: [BigQuery Command Line Tool, Cloud Datastore Emulator, Cloud SDK Core Libraries, Cloud Storage Command Line Tool, Google Cloud Build Local Builder, gcloud Beta Commands]
Documentation on Cloud Build build file: https://cloud.google.com/cloud-build/docs/build-config
Documentation to configure secrets with cloud build: https://cloud.google.com/cloud-build/docs/securing-builds/use-secrets
Documentation for cloud build local: https://cloud.google.com/cloud-build/docs/build-debug-locally
Steps performed:
Added secrets to Secret Manager
Enabled API between Cloud Build and Secrets Manager
Added cloudbuild service account as member of each secret password.
Added IAM permission Secret Manager Secrets Accessor to cloudbuild user. I don't know where I got this info from but it is residual at this point from other attempts to use Secret Manager with cloudbuild. I am not sure of the difference between applying access here vs applying to the Secret Manager secret.
Command: cloud-build-local --config=cloudbuild.staging.yaml --dryrun=false .
cloudbuild.staging.yaml:
- name: gcr.io/cloud-builders/npm
entrypoint: 'npm'
args: [ 'install' ]
- name: 'gcr.io/cloud-builders/gcloud'
args: ["app", "deploy"]
env:
- 'DAO_FACTORY=datastore'
- 'POLL_INTERVAL=15'
- 'PROMPT=staging>'
- 'ENVIRONMENT=staging'
- 'NAMESPACE=staging'
- 'RESET_DATASTORE=false'
secretEnv: ['ADMIN_USER', 'SUPER_ADMINS', 'BOT_TOKEN']
availableSecrets:
secretManager:
- versionName: projects/{project token}/secrets/SYSTEM_USER/versions/1
env: 'ADMIN_USER'
- versionName: projects/{project token}/secrets/SUPER_ADMINS/versions/1
env: 'SUPER_ADMINS'
- versionName: projects/{project token}/secrets/BOT_TOKEN/versions/2
env: 'BOT_TOKEN'```
Tag: cloud-build-local. I guess without reputation a meaningful tag cannot be created. Maybe an esteemed community member will create this as this may be specific to cloud-build-local only.
Support for Google Secret Manager in Google Cloud Build descriptor file is apparently very new and does not appear to be supported by cloud-build-local component at this time; please see comment from Guillaume about feature being a week old. When cloud build descriptor is ran in Cloud Build, it works fine.
I fixed a similar issue by upgrading the gcloud tool.

Using NFS with Ververica for artifact storage not working, throwing Error: No suitable artifact fetcher found for scheme file

I'm trying to setup Ververica community edition to use NFS for artifact storage using the following values.yaml
vvp:
blobStorage:
baseUri: file:///var/nfs/export
volumes:
- name: nfs-volume
nfs:
server: "host.docker.internal"
path: "/MOUNT_POINT"
volumeMounts:
- name: nfs-volume
mountPath: /var/nfs
When deploying the flink job, using job uri below:
jarUri: file:///var/nfs/artifacts/namespaces/default/flink-job.jar
I am able to see my artifacts in the Ververica UI, however when I try to deploy the flink job it fails with the following exception:
Error: No suitable artifact fetcher found for scheme file
Full error:
Some pod containers have been restarted unexpectedly. Init containers reported the following reasons: [Error: No suitable artifact fetcher found for scheme file]. Please check the Kubernetes pod logs if your application does not reach its desired state.
If I remove the "file://" from the jobURi to just the following the job containers keep restarting without giving error.
jarUri: /var/nfs/artifacts/namespaces/default/flink-job.jar
As a side note, I also added the following to the deployment.yaml, If I set the artifact to pull from an http endpoint it does save the checkpoints correctly in the NFS, so it seems that the only problem is loading artifacts from the nfs using file:// scheme.
kubernetes:
pods:
volumeMounts:
- name: my-volume
volume:
name: my-volume
nfs:
path: /MOUNT_POINT
server: host.docker.internal
volumeMount:
mountPath: /var/nfs
name: my-volume
Ververica Platform does not currently support NFS drives for Universal Blob Storage.
However, you can emulate this behavior if using version >= 2.3.2 by mounting the NFS drive to your Flink pods as you did in the deployment spec for checkpoints. This works because 2.3.2 added support for self-contained and fetching local files. You can see more information in the documentation here

How to Add Environment Variables to Google App Engine

I have deployed my Django Project to Google App Engine and I need to add environment variables.
The docs say to add them to app.yaml but that seems like bad practice because app.yaml should be in your git repository.
Is there any way to add environment variables to App Engine the same way you can add them in Cloud Run > Services > Variables & Secrets ?
Google Secret Manager is available, since this spring:
Enable Secret Manager API
Add the Secret Manager Secret Accessor role to the App Engine SA
Create secretes from the GCP Web UI or programmatically(code examples are from official documentation):
def create_secret(project_id, secret_id):
"""
Create a new secret with the given name. A secret is a logical wrapper
around a collection of secret versions. Secret versions hold the actual
secret material.
"""
# Import the Secret Manager client library.
from google.cloud import secretmanager
# Create the Secret Manager client.
client = secretmanager.SecretManagerServiceClient()
# Build the resource name of the parent project.
parent = client.project_path(project_id)
# Create the secret.
response = client.create_secret(parent, secret_id, {
'replication': {
'automatic': {},
},
})
# Print the new secret name.
print('Created secret: {}'.format(response.name))
Consume the secrets from the app instead of the environment variables:
def access_secret_version(project_id, secret_id, version_id):
"""
Access the payload for the given secret version if one exists. The version
can be a version number as a string (e.g. "5") or an alias (e.g. "latest").
"""
# Import the Secret Manager client library.
from google.cloud import secretmanager
# Create the Secret Manager client.
client = secretmanager.SecretManagerServiceClient()
# Build the resource name of the secret version.
name = client.secret_version_path(project_id, secret_id, version_id)
# Access the secret version.
response = client.access_secret_version(name)
# Print the secret payload.
#
# WARNING: Do not print the secret in a production environment - this
# snippet is showing how to access the secret material.
payload = response.payload.data.decode('UTF-8')
print('Plaintext: {}'.format(payload))
If you are using a continuous deployment process you could rewrite (or created) the app.yaml to include variables relevant to each deployment target within the CD build system.
We rewrite several files as part of our deployment process to App engine using Bitbucket pipelines. Variables can be defined at a workspace level (across multiple repositories), within a repository, and also for each deployment target defined. These variables can be secured so they are not readable.
build: &build
- step:
name: Update configuration for deployment
script:
- find . -type f -name "*.yaml" -exec sed -i "s/\[secret-key-placeholder\]/$SECRET_KEY/g" {} +
Refer to https://support.atlassian.com/bitbucket-cloud/docs/variables-in-pipelines/#Deployment-variables

gcloud app deploy does not terminate even when service is running

I am deploying a node.js server to Google App Engine from Bitbucket pipeline environment and the last command in the script is: gcloud -q app deploy app.yaml --no-promote --verbosity=debug
The logs show that the service is deployed successfully but the script is not terminating, this is the last part of the log:
> DEBUG: Reading GCS logfile: 206 (read 10 bytes) PUSH DONE DEBUG:
> Operation [...] complete. Result: {...} DEBUG: Reading GCS logfile:
> 416 (no new content; keep polling)
> -------------------------------------------------------------------------------- DEBUG: Converted YAML to JSON: "{...}" DEBUG: Operation [...] not
> complete. Waiting to retry. Updating service [default] (this may take
> several minutes)... .DEBUG: Operation [...] not complete. Waiting to
> retry. ......DEBUG: Operation [...] not complete. Waiting to retry.
> .......DEBUG: Operation [...] not complete. Waiting to retry.
> ......DEBUG: Operation [...] not complete. Waiting to retry.
> .......DEBUG: Operation [...] not complete. Waiting to retry.
> .......DEBUG: Operation [...] not complete. Waiting to retry.
I tried to add readiness_check and liveness_check to app.yml but it didn't change the behaviour.
readiness_check:
path: "/api/public/logout"
check_interval_sec: 5
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
app_start_timeout_sec: 300
liveness_check:
path: "/api/public/logout"
check_interval_sec: 30
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
The main unknown here is what criteria does gcloud app deploy uses to determine termination condition?
Also, is there any bypass to this problem?
Update
The problem happens also when running the gcloud app deploy command from local environment (my laptop).
The problem does NOT happen when removing the --no-promote flag.
The gcloud app deploy command expects a well-formed and valid app.yml file, this is what determines its termination condition.
As you confirmed the deployment worked without the --no-promote flag, it could mean that something in the configuration expects the application to be already deployed and running, thus preventing the script to complete.
Another possible cause would be that the Google Cloud SDK version specified in bitbucket-pipelines.yml is an older one. Make sure you work with the latest. This consideration applies extensively to all dependencies in package.json, which might be conflicting with one another, especially when using older versions of Node.js.
This guide can help at building a sound configuration for Bitbucket-based deployments; although the example given is with Python, it might as well be used as a template for processing a Node.js pipeline.
Nb. in this solution, the Google Cloud SDK version is an older one (127.0.0), which will make this deployment fail, so it should be replaced with the latest (228.0.0 or higher). Also the guide omits another required API activation: Cloud Build API. I've notified the team to amend the solution.
I've tested several scenarios with a simple Node.js server, and could not reproduce the issue. Check my Github repository for the code.
For further help on this topic, please provide more hints, such as the content of the app.yml, bitbucket-pipelines.yml, and package.json files, as well as a description of the state of App Engine (services, versions).
In order to deploy the test repository to App Engine from Bitbucket, make sure the following is done on the project:
Enable API's:
App Engine Admin
Cloud Build
Create a Service Account with following permissions, and generate an API Key:
App Engine: Admin
Cloud Build: Editor
Storage: Object Admin

Resources