"Error querying agent backend. State: URL_TIMEOUT, reason: TIMEOUT_DNSLOOKUP" - google-smart-home

Since 1.5-2 my google smarthome action was working absolutely fine with device sync, state query and all relevant actions.
Since last 2 months I am getting following error, although I haven't changed anything:
0: {
action: {
actionType: "STATE_QUERY"
}
device: {
deviceType: "LIGHT"
}
status: {
externalDebugString: "Error querying agent backend. State: URL_TIMEOUT, reason: TIMEOUT_DNSLOOKUP"
isSuccess: false
statusType: "EXECUTION_BACKEND_FAILURE"
}
}
]
executionType: "PARTNER_CLOUD"
latencyMsec: "2834"
requestId: "5786688694498341746"
}
]
}
locale: "en-US"
}
Now, the smart home devices does not report states and control anything and on the Google Home app shows "Not Responding". And strange thing is, sometime it does work (2 out of 10 times I would say).
Another Info: I have the server hosted at my data centre and absolutely no changes have been made in terms of network, DNS etc.
Can anyone please advise what could be the reason for this? and how it could be resolved. Help is highly appreciated.

The error returned has the status type EXECUTION_BACKEND_FAILURE which means the Google smart home execution service tried to locate and reach out to your fulfillment endpoint but could not receive a valid response, potentially due to one of many different reasons.
The error log indicates that Google is trying to do a DNS lookup but is failing by timing out. You should check your server settings, and make sure that the domain name & IP matching is happening correctly with your DNS records (you can use a tool like nslookup to verify how it resolves).

Related

Google Action does not have trained NLU model (No trained NLU model found.) [duplicate]

We are developing boot using Actions on google SDK, we migrated our dev project UAT and all of sudden its stoped working. Previously we are using same approach and its working every time. Bot respond once for initial phrase after that it stop responding. it say Sorry, [Bot name] is not responding. Please try again later. After tracing the logs we found its sending below error. Please guide us what is wrong with our approach.
{
labels: {3}
type: "assistant_action"
}
severity: "ERROR"
textPayload: "No trained NLU model found."
timestamp: "2022-02-17T12:00:35.499117218Z"
trace: ""
}
This issue resolved now, Google resolved this issue we need to follow these steps. Open your project-> Go to Main invocation-> Edit it by adding prompt message->Save the changes-> Now NLU model trining in progress will show up at bottom, wait until its execution end. -> After that try testing your action, it will work Please see this for more

AWS Appsync GraphQL query silently failing though successful 200 response

I have one React Amplify app running with two environments. One environment is for my wife's blog (www.riahraineart.com) and one for my blog (www.joshmk.com). Both sites are running off the same repo, I'm just configuring the site's differently based on an environment variable I use to retrieve their configurations from a table.
useEffect(() => {
async function fetchData() {
const configData = await API.graphql({
query: queries.getConfiguration,
variables: { id: process.env[configIdName] },
});
if (configData && isMounted.current)
setConfig(configData.data.getConfiguration || {});
}
if (process.env[configIdName]) {
fetchData();
}
}, [isMounted, configIdName]);
For my site, when I make the GraphQL request for this configuration, it's successful and the site spins up. For my wife's site, this call to the configurations table silently fails. By silent, I mean there's no helpful response being returned from the API even though it's a successful 200 response.
When I open AppSync, go to the two environments and run the queries, I receive the configuration items. I also see them when I open dynamodb.
I'm thinking there could be some expired token for something somewhere but if that was the case, I would think I'd receive a failed response that would state that.
Another possibility could be that my wife had modified the configuration of her site or created a post with some content that the frontend doesn't expect. But in that case, I would atleast expect to see a response from the call to receive her sites configuration.
Thank you beforehand for any insights here!
I fixed it! Unfortunately, the solution was pretty specific to my situation so it may not provide too much value for others. Although, I hope it helps with troubleshooting.
After locally switching my backend configuration over to her site using amplify pull --appId <appId> --envName <envName> I noticed that the configuration call was now successful. I had forgotten that I had never actually run her site locally, I only hopped to her branch to merge and push.
The site was still not rendering though, which perked my ears for a race condition. I discovered that I had left a checker for some images that was gating render of my topmost component. My wife has a ton of images, so I think this call was taking too long to make the chain of events load items in the correct order, and the page showed blank. Simply removing that check for those images at that point, showed the UI.

Errors in vm.syslog and Memory Usage constantly increasing on NodeJS AppEngine

I am having a problem on some of my AppEngine projects, since a few days I started to I see a lot of errors (which I noticed they might happen when an health check arrives) in my vm.syslog logs from Stackdriver Logging.
In the specific these are:
write_gcm: Server response (CollectdTimeseriesRequest) contains errors:#012{#012 "payloadErrors": [#012 {#012 "index": 71,#012 "error": {#012 "code": 3,#012 "message": "Expected 4 labels. Found 0. Mismatched labels for payload [values {\n data_source_name: \"value\"\n data_source_type: GAUGE\n value {\n double_value: 694411264\n }\n}\nstart_time {\n seconds: 1513266364\n nanos: 618061284\n}\nend_time {\n seconds: 1513266364\n nanos: 618061284\n}\nplugin: \"processes\"\nplugin_instance: \"all\"\ntype: \"ps_rss\"\n] on resource [type: \"gce_instance\"\nlabels {\n key: \"instance_id\"\n value: \"xxx\"\n}\nlabels {\n key: \"zone\"\n value: \"europe-west2-a\"\n}\n] for project xxx"#012 }#012 }#012 ]#012}
write_gcm: Unsuccessful HTTP request 400: {#012 "error": {#012 "code": 400,#012 "message": "Field timeSeries[11].metric.labels[1] had an invalid value of \"health_check_type\": Unrecognized metric label.",#012 "status": "INVALID_ARGUMENT"#012 }#012}
write_gcm: Error talking to the endpoint.
write_gcm: wg_transmit_unique_segment failed.
write_gcm: wg_transmit_unique_segments failed. Flushing.
At the same time, I noticed that my Memory Usage in the AppEngine dashboard for the very same projects is increasing with the passing of time at the point where it reaches the max amount available and the instance restarts, throwing a 502 error when visiting the web site that the app is serving.
All this is not happening on a couple of projects that have not been updated since at least 2 weeks (neither the errors above or the memory increase) but it does happen on a newly created instance when deployed with the same codebase of one of the healthy projects. In addition, I don't happen to see any increase in the memory when running my project locally.
Can someone gently tell me if they experienced something similar or if they think that the errors and the memory increase are related? I have haven't changed my yaml file for deployment recently and I haven't specified any custom configuration for the health checks (which run on legacy mode at the default rate).
Thank you for your help,
Nicola
Simliar question here App Engine Deferred: Tracking Down Memory Leaks
Going through same thing in compute engine on a single VM. I've tried increasing memory but the problem persists. Seems to be tied to a stackdriver method call. Not sure what to do, causes machines to stop after about 24hrs for me. In my case, I'm getting information every 3 seconds from a set of API's, but the error comes up every minute in the serial port 1 (console), which makes me suspect that it is a some kind of failure outside of my code. More from Google here: https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.collectdTimeSeries/create .
I'm not sure about all of the errors, but for the "write_gcm: Server response (CollectdTimeseriesRequest)" I had the same issue and contacted Google Cloud Support. They told me that the Stackdriver service has been updated recently to accept more detailed information on ps_rss metrics, but it has caused metrics from older agents to not be sent at all.
You should be able to fix this issue by upgrading your Stackdriver agent to the latest version. On Compute Engine (that I was running) you have control over this, I'm not sure how you'd do it on AppEngine, maybe trigger a new deploy?

Realtime document permanantly unable to be loaded due to server error

Earlier today we started to see instances of server errors popping up on an old realtime document. This is a persistent error and the end result appears to be that the document is completely inaccessible using the gapi.drive.realtime.load endpoint. Not great.
However the same document is accessible through the gapi.client.drive.realtime.get endpoint. Which is great for data recovery, but not so great for actually using the document. It's possible I can 'fix' the document by doing a 'drive.realtime.update', but haven't tried as hopefully the doc can be used to track down the bug.
Document ID: 0B9I5WUIeAEJ1Y3NLQnpqQWVlX1U
App ID: 597847337936
500 Error Message: "Document was not successfully migrated to new UserKey format"
Anyone else seeing this issue? Can I provide any additional information?

Unsure on how to solve a "termsOfServiceNotAccepted" Error

Background:
So I'm a novice to the whole app engine thing: I have made an app on google app engine that on the main page accepts user input and then sends the information to a handler that then uses the Big Query API to run a synchronous query with some tables I have uploaded to Big Query. The handler then sends back the results of the query as a json.
Problem:
In deployment it works mostly except sometimes a user can often run into this error while trying to make the synchronous query:
Error in runSyncQuery:
{
"error": {
"errors": [
{
"domain": "global",
"reason": "termsOfServiceNotAccepted",
"message": "BigQuery Terms of Service have not been accepted"
}
],
"code": 403,
"message": "BigQuery Terms of Service have not been accepted"
}
}
After doing some searching I com across this:
https://groups.google.com/forum/#!msg/bigquery-announce/l0kwVNpQX5A/ct0MglMmYMwJ
If you make API calls that are authenticated as an end user, then API calls will soon return errors when the end user has not accepted the updated Terms of Service. Apps built against the BigQuery API should ideally look for those errors and direct the user to the Google APIs Console to accept the new terms.
Except I dont really understand how to do this.
Also all the potential user accounts that I have tested my app with have access to a specific project that has Big Query API enabled, and can use the API so why does this pop up?
Also there are times when a specific account does not run into this problem. For instance if I keep refreshing and retrying to use the app eventually it will not have this problem and work. But then the next time this error will resurface again.
I don't understand how a user can have accepted the terms of service at one point of time and then not another at some point in the future?
Yes, any end users who authorize access to the BigQuery API must accept the Terms of Service (ToS) provided by the Google Developers Console at https://developers.google.com/console
It is possible that Terms of Service can change, and that some of your project members have not yet accepted updated BigQuery ToS. If one of your users is receiving this message when authorizing access to the BigQuery API, you redirect them to the https://developers.google.com/console to accept the terms of service.
Re: "specific account does not run into this problem" - can you confirm this is happening consistently with a specific account on a specific project?

Resources