DogstatsD not sending JVM runtime metrics from Google App Engine flex environment - google-app-engine

According to DataDog JVM metrics collection is enabled by default for Java tracer v0.29.0+
https://docs.datadoghq.com/tracing/metrics/runtime_metrics/java/
My agent is running and trace metrics are coming in fine, but I am not seeing the data coming in on the JVM metrics tab in the APM section.
I confirmed with DD helpdesk that everything is configured correctly for a containerized environment. I was expecting the JVM metrics to automatically like this doc describes:
https://docs.datadoghq.com/tracing/metrics/runtime_metrics/java/
app.yaml
DD_AGENT_HOST: "our_gcp_host"
DD_TRACE_AGENT_PORT: "80"
DD_ENV: "dev"
DD_SERVICE: "our_service_tag"
dd-app.yaml
service: dd-agent
runtime: custom
env: flex
env_variables:
DD_APM_ENABLED: "true"
DD_APM_NON_LOCAL_TRAFFIC: "true"
DD_APM_RECEIVER_PORT: 8080 // custom port configuration
DD_DOGSTATSD_NON_LOCAL_TRAFFIC: 'true'
DD_DOGSTATSD_PORT: 8125
network:
forwarded_ports:
- 8125/udp

I posted this so that I can answer this question. It was a few days of investigation, but we figured it out.
The solution is to deploy the agent to a compute engine instance. According to my colleague that figured it out the reason for this is:
Despite the fact app engine and docs say you can port forward, it looks like it doesn't actually allow the port to be accessible via the dns, just the ips which change as instances go up/down. We made a compute engine instance of the dd-agent and set our api to it's ip.
GCP isn't honest about port forwarding in App Engine. You can port forward but the app engine dns can't be used so you would have to use the instance ips. It also looks like udp load balancers may not work with app engine which makes the entire idea behind the port forwarding kinda pointless.
Try it out! We saw our metrics show up immediately.

Related

Cloud Run static outbound IP address does not go through Google App Engine firewall

I have a python (flask) application running on Google App Engine (flex); the application is protected by the GAE firewall where:
Default rule is 'Deny' all ingress
There is a whitelist of IP addresses from which traffic is allowed.
I have some microservices deployed on Cloud Run (fully managed) which:
Receive requests from the GAE app (e.g. for heavy duty tasks)
Send the results of whatever they process as http requests back to handlers/endpoints in the GAE app
Thus the GAE app is the main point of interaction with clients and a dispatcher of heavy tasks, while the processing of those tasks is carried out by the microservices. I have set up a static outbound IP address of the Cloud Run hosted service which verfiedly works and traffic is routed through the NAT gateway as required in the documentation. The respective NAT IP address is on the firewall whitelist.
The problem is that the firewall still does not let in the Cloud Run >>> GAE app requests which bounce back with 403 statuses (of course, if I change the default firewall rule to 'Allow', traffic goes through). If I host the same microservice in a docker container on a GCE VM with a static IP address like this everything works flawlessly. This makes me hypothesize that albeit Cloud Run outbound traffic is indeed routed through the static IP address when traffic is towards addressees outside GCP, when I try to ping an internal (project-wise) asset it still goes though some dynamically selected IP (i.e. the static IP solution simply does not work). Unfortunately the logs don't show the 403-ed attempt so I can't see from what IP addresses those request seem to come (from a GAE standpoint).
I would be very grateful for ideas how this can be fixed as it greatly diminishes the value of the otherwise wonderful idea to have static outbound IP addresses for Cloud Run.
First, thank you both for your help and suggestions, they are very helpful. I found the solution with some kind help from Google:
When the Cloud Run microservice and the GAE app are hosted in the same project traffic is still routed through internal channels and appears to come from IP address 0.0.0.0 which can be whitelisted (so it would work) as long as one considers this address encompasses GCP assets which are parts of other projects too (to the best of my understanding)
A more robust solution seems to be setting up an externally facing load balancer as described here and putting it in front of the GAE app; in such a case, Cloud Run will indeed consistently use its static outbound IP address as described in the documentation
You are correct saying that the static IP is not honoured when packets are routed internally to GCP.
I think this is what you want. You have to allow in the firewall one of the IPs mentioned there (not sure which one right now).
Just as you and #Ema mentioned, this is an expected behavior having in mind that the traffic from Cloud Run to App Engine is intern.
When you use Cloud Nat to send all traffic there, it does happen. If you create a container and ping, let's say to www.github.com. You will find that the traffic goes through the IP you set. On the other hand, if you ping to www.google.com, given that the traffic is intern, and the site to reach out is in the same infrastructure, the request doesn't even goes through public internet.
Additionally, just to keep in mind Static outbound IP address is still in Beta and it is not recommended to use Beta features/products in production environments.
As you mentioned and as it is stated in Allowing requests from your services:
Creating a rule for IP 0.0.0.0 will apply to all Compute Engine instances with Private Google Access enabled, not only the ones you own. Similarly, allowing requests from 0.1.0.40 or 10.0.0.1 will allow any App Engine app to make URL Fetch requests to your app.
This questions might be of your interest:
What are the outbound IP ranges for GCP managed Cloud Run?
Possible to get static IP address for Google Cloud Functions?

Google App Engine Manual Scaling Prevents Restart

I have a python app engine that handles api results and it's stateful. However it seems that after a few hours of inactivity (no requests), the server shuts off, resetting all states, and when a new request is made, it's listening again.
But the states are reset. I want the server to actively remain unchanged 24/7 and not reset/restart as I want to maintain states.
I have configured as per documentation but it's still restarting, I am not sure what's wrong
Here is my app.yaml:
runtime: python37
entrypoint: python main.py
manual_scaling:
instances: 1
In App Engine the general recomendation is to create stateless applications as mentioned on the documentation
Your app should be "stateless" so that nothing is stored on the instance.
As an alternative for the application not to get restarted you can deploy it on Compute Engine, As that service is a Virtual Machine you can have total control of the states.

App Engine Standard, Serverless VPCs, Cloud Memorystore giving significant amount of timeouts

We configured our App Engine Standard python 3 service to connect to Cloud Memorystore via the Serverless VPC service (per the documentation, and other stack overflow threads). (I've included the app.yaml config below). This all worked well, unless an instance went idle for a little while. Over time we saw a high volume of:
Long unexplained hangs when making calls to Memorystore, even though they eventually worked
redis.exceptions.ConnectionError: Error 110 connecting to 10.0.0.12:6379. Connection timed out.
redis.exceptions.TimeoutError: Timeout reading from socket
These happened to the point where I had to move back to App Engine Flexible, where the service runs great without any of the above problems.
My conclusion is that Serverless VPC does not handle the fact that the redis client tries hard to leave the connection to redis open all the time. I tried a few variations of timeout settings, but nothing that helped. Has anyone successfully deployed App Engine Standard, Memorystore, and Serverless VPC?
env_variables:
REDISHOST: <IP>
REDISPORT: 6379
network:
name: "projects/<PROJECT-ID>/global/networks/default"
vpc_access_connector:
name: "projects/<PROJECT-ID>/locations/us-central1/connectors/<VPC-NAME>
Code used to connect to Memorystore (using redis-py):
REDIS_CLIENT = redis.StrictRedis(
host=REDIS_HOST,
port=REDIS_PORT,
retry_on_timeout=True,
health_check_interval=30
)
(I tried various timeout settings but couldn't find anything that helped)
I created a Memorystore instance and a Serverless VPC Access connector as stated in the docs (https://cloud.google.com/vpc/docs/configure-serverless-vpc-access), then deployed this sample (https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/appengine/standard_python37/redis) from Google Cloud Platform Python doc samples repo to App Engine Standard after making some modifications:
This is my app.yaml:
runtime: python37
# Update with Redis instance details
env_variables:
REDIS_HOST: <memorystore-ip-here>
REDIS_PORT: 6379
# Update with Serverless VPC Access connector details
vpc_access_connector:
name: 'projects/<project-id>/locations/<region>/connectors/<connector-name>'
# [END memorystore_app_yaml_standard]
I edited the code on main.py and used the snippet that you use to connect to the memorystore instance. It ended up like this:
redis_client = redis.StrictRedis(
host=redis_host, port=redis_port,
password=redis_password,
retry_on_timeout=True,
health_check_interval=30
)
I edited the requirements.txt. I changed “redis==3.3.8” for “redis>=3.3.0”
Things to note:
Make sure to use “gcloud beta app deploy” instead of “gcloud app deploy” since it is needed in order for the Serverless VPC Access connector to work.
Make sure that the authorized network you set to the memorystore instance is the same that you select for the Serverless VPC Access connector
This works as expected for me, could you please check if this works for you?
You may try to use min idle instance option, so you will have at least one idle instance to wait to serve your traffic. Bear in mind that this may change your billing cost. Also here you can find a billing calculator.
If the min idle instances are set to 0 there are no available instances to serve your traffic when the requests are starting and may this be the reason of having exceptions.

Reaching quota for ip address in use too fast for App engine

I have a webserver running in App Engine and the client is a mobile app. I am seeing that a lot of requests on the mobile are starting to fail once we scaled up to a lot of users. I am not seeing any failures in our logs however. I noticed in our quotas that our ip address in use for Compute Engine API is at its max of 8 (even though we're not running any services on Compute Engine).
I am not sure if this is the root cause but it wasn't like this before, I was wondering if there is any advice on how to address this problem or if there are better way to structure our server to meet our use case.
EDIT:
Our current configuration is a flex environment on App engine, with minimum 2 instances. We also have a MySQL instance. Those pretty much so far everything we've used.
runtime: php
env: flex
api_version: 1
handlers:
- url: /.*
script: public/index.php
runtime_config:
document_root: public
beta_settings:
# for Cloud SQL, set this value to the Cloud SQL connection name,
# e.g. "project:region:cloudsql-instance"
cloud_sql_instances: "<project>:<region>:<sql-instance>"
You didn't mention it in your question but I believe you are using App Engine Flexible environment. Under the hood, App Engine flex apps run on (hidden from you) Compute Engine instances in your project. So it actually goes against Compute Engine quotas as well, including the "ip address in use" for your App Engine Region.
The "ip address in use" impacts your App Engine flex app in that it'll limit the number of instances your app will be able to scale up to, since each instance uses its own IP. For example, as per the app.yaml file you provided, your scaling setting defaults to automatic scaling with minimum 2 instances and maximum 20 instances. The "ip address in use" quota will prevent your app to upscale above 8 instances when the number of users using your app increases.
One other thing to note is that you may have previous versions of your service that are still running. If they have the same scaling setting, this means they'll have minimum 2 instances running each, which will count towards the "ip address in use" quota also.
Since you can't deploy your App Engine instances in a network in another region that the one you set for your App Engine app, the only solution here is to request a quota increase. In your Developer Console, got to IAM & admin > Quotas, select this particular quota and click on the "Edit Quotas" button at the top and follow the instructions.

Debugging GAE microservices locally but without using localhost

I would like to debug my Google App Engine (GAE) app locally but without using localhost. Since my application is made up of microservices, the urls in a production environment would be along the lines of:
https://my-service.myapp.appspot.com/
But code in one service can call another service and that means that the urls are hardcoded. I could of course use a mechanism in code to determine whether the app is running locally or on GAE and use urls that are different although I don't see how a local url would handle the since the only way to run an app locally is to use localhost. Hence:
http://localhost:8080/some-service
Notice that "some-service" maps to a servlet, whereas "my-service" is a name assigned to a service when the app is uploaded. These are really two different things.
The only possible solution I was able to find was to use a reverse proxy which would map one url to a different one. Still, it isn't clear whether the GAE development SDK even supports this.
Personally I chose to detect the local development vs GAE environment and build my inter-services URLs accordingly. I feel it was a well-worthy effort, I've been (re)using it a lot. No reverse proxy or any other additional ops necessary, it just works.
Granted, I'm using Python, so I'm not 100% sure a complete similar Java solution exists. But maybe it can point you in the right direction.
To build the per-service URLs I used modules.get_hostname() (the implementation is presented in Resolve Discovery path on App Engine Module). I believe the Java equivalent would be getInstanceHostname() from com.google.appengine.api.modules.
This method, when executed on the local server, automatically provides the particular port the server listens to for each service.
BTW, all my services for an app are executed by a single development server process, which listens on multiple ports (this is, I guess, how it can provide the modules.get_hostname() info). See Running multiple services using dev_appserver.py on different ports. This is part I'm unsure about: if/how the java local dev server can simultaneously run multiple services. Apparently this used to be supported some time ago (when services were still called modules):
Serving multiple GAE modules from one development server?
GAE modules on development server
This can be accomplished with the following steps:
Create an entry in the hosts file
Run the App Engine Dev server from a Terminal using certain options
Use IntelliJ with Remote debugging to attach the App Engine Dev server.
To edit the hosts file on a Mac, edit the file /etc/hosts and supply the domain that corresponds to your service:. Example:
127.0.0.1 my-service.myapp.com
After you save this, you need to restart your computer for the changes to take place.
Run the App Engine Dev server manually:
dev_appserver.sh --address=0.0.0.0 --jvm_flag=-Xdebug
--jvm_flag=-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8000
[path_to_exploded_war_directory]
In IntelliJ, create a debug configuration. Use the Remote template to create this configuration. Set the host to the url you set in the hosts file and set the port to 8000.
You can set a breakpoint and run the app in IntelliJ. IntelliJ will attach to the running instance of App Engine Dev server.
Because you are using a port during debugging and no port is actually used when the app is uploaded to the GAE during production, you need to add code that identifies when the app is running locally and when it's running on GAE. This can be done as follows:
private String mServiceUrl = "my-service.my-app.appspot.com";
...
if (SystemProperty.environment.value() != SystemProperty.Environment.Value.Production) {
mServiceUrl += ":8000";
}
See https://cloud.google.com/appengine/docs/standard/java/tools/using-local-server
An improved solution is to avoid including the port altogether and not having to use code to determine whether your app is running locally or on the production server. One way to do this is to use Charles (an application for monitoring and interacting with requests) and use a feature called Remote Mapping which lets you map one url to another. When enabled, you could map something like:
https://my-service.my-app.appspot.com/
to
https://localhost:8080
You would then enable the option to include the original host, so that this gets delivered to the local dev server. As far as your code is concerned it only sees:
https://my-service.my-app.appspot.com/
although the ip address will be 127.0.0.1:8080 when remote mapping is enabled. To use https on local host however does require that you enable ssl certificates for Charles.
For a complete overview on how to setup and debug microservices for a GAE Java app in IntelliJ, see:
https://github.com/JohannBlake/gae-microservices

Resources