Earlier today we started to see instances of server errors popping up on an old realtime document. This is a persistent error and the end result appears to be that the document is completely inaccessible using the gapi.drive.realtime.load endpoint. Not great.
However the same document is accessible through the gapi.client.drive.realtime.get endpoint. Which is great for data recovery, but not so great for actually using the document. It's possible I can 'fix' the document by doing a 'drive.realtime.update', but haven't tried as hopefully the doc can be used to track down the bug.
Document ID: 0B9I5WUIeAEJ1Y3NLQnpqQWVlX1U
App ID: 597847337936
500 Error Message: "Document was not successfully migrated to new UserKey format"
Anyone else seeing this issue? Can I provide any additional information?
Related
Google App Engine (GAE) creates a table for each day of logging with lines containing several logs information, such as "status" (e.g.: 500, 404), etc.
Yet, this table does not contain the Resolution Status for the errors in Error Reporting. E.g.:
At the moment, I would like to get how many "Acknowledged" errors happen per day. I can get which errors happen per day through the appengine_googleapis_com_request_log_* (e.g.: appengine_googleapis_com_request_log_20211130). However, I don't know how can I get if the error is Acknowledged or Open.
Does anyone know how can I combine those information, or at least if Error Reporting saves its information in any BigQuery table?
Reviewing Google documentation to help with your question, apparently, it is impossible to get the resolution status.
You can view your errors on the Error Reporting page of your GCP Console, which displays a list of all errors in the order of frequency. Errors with the same root cause are grouped together. The error reporting list provides the following information for all reported errors:
Resolution status
Occurrences
Users
Error
Seen in
First seen
Last seen
Response code
In case you would like to have more information, you can review the viewing error documentation.
Now, as I said, there is no way to get the resolution status; however, what you can try to do is a feature request.
Here is some documentation that shows what you can get with the API, it could help you with the feature request.
https://cloud.google.com/support/docs/issue-trackers
https://cloud.google.com/error-reporting/reference/rest/v1beta1/ErrorEvent
Additionally, here is a link I found regarding the Error Processing and Log Monitoring documentation using GCP.
I am having a problem on some of my AppEngine projects, since a few days I started to I see a lot of errors (which I noticed they might happen when an health check arrives) in my vm.syslog logs from Stackdriver Logging.
In the specific these are:
write_gcm: Server response (CollectdTimeseriesRequest) contains errors:#012{#012 "payloadErrors": [#012 {#012 "index": 71,#012 "error": {#012 "code": 3,#012 "message": "Expected 4 labels. Found 0. Mismatched labels for payload [values {\n data_source_name: \"value\"\n data_source_type: GAUGE\n value {\n double_value: 694411264\n }\n}\nstart_time {\n seconds: 1513266364\n nanos: 618061284\n}\nend_time {\n seconds: 1513266364\n nanos: 618061284\n}\nplugin: \"processes\"\nplugin_instance: \"all\"\ntype: \"ps_rss\"\n] on resource [type: \"gce_instance\"\nlabels {\n key: \"instance_id\"\n value: \"xxx\"\n}\nlabels {\n key: \"zone\"\n value: \"europe-west2-a\"\n}\n] for project xxx"#012 }#012 }#012 ]#012}
write_gcm: Unsuccessful HTTP request 400: {#012 "error": {#012 "code": 400,#012 "message": "Field timeSeries[11].metric.labels[1] had an invalid value of \"health_check_type\": Unrecognized metric label.",#012 "status": "INVALID_ARGUMENT"#012 }#012}
write_gcm: Error talking to the endpoint.
write_gcm: wg_transmit_unique_segment failed.
write_gcm: wg_transmit_unique_segments failed. Flushing.
At the same time, I noticed that my Memory Usage in the AppEngine dashboard for the very same projects is increasing with the passing of time at the point where it reaches the max amount available and the instance restarts, throwing a 502 error when visiting the web site that the app is serving.
All this is not happening on a couple of projects that have not been updated since at least 2 weeks (neither the errors above or the memory increase) but it does happen on a newly created instance when deployed with the same codebase of one of the healthy projects. In addition, I don't happen to see any increase in the memory when running my project locally.
Can someone gently tell me if they experienced something similar or if they think that the errors and the memory increase are related? I have haven't changed my yaml file for deployment recently and I haven't specified any custom configuration for the health checks (which run on legacy mode at the default rate).
Thank you for your help,
Nicola
Simliar question here App Engine Deferred: Tracking Down Memory Leaks
Going through same thing in compute engine on a single VM. I've tried increasing memory but the problem persists. Seems to be tied to a stackdriver method call. Not sure what to do, causes machines to stop after about 24hrs for me. In my case, I'm getting information every 3 seconds from a set of API's, but the error comes up every minute in the serial port 1 (console), which makes me suspect that it is a some kind of failure outside of my code. More from Google here: https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.collectdTimeSeries/create .
I'm not sure about all of the errors, but for the "write_gcm: Server response (CollectdTimeseriesRequest)" I had the same issue and contacted Google Cloud Support. They told me that the Stackdriver service has been updated recently to accept more detailed information on ps_rss metrics, but it has caused metrics from older agents to not be sent at all.
You should be able to fix this issue by upgrading your Stackdriver agent to the latest version. On Compute Engine (that I was running) you have control over this, I'm not sure how you'd do it on AppEngine, maybe trigger a new deploy?
I am suddenly getting a 403 error when I try to POST an update to the Retrieve and Rank service. This code is under development but it has been working up until yesterday. The failure occurs only when doing a POST to /v1/solr_clusters/{solr_cluster_id}/solr/{collection_name}/update, and it fails the same way whether I do it via my program, the Swagger API documentation, or cURL. All other operations to this service that I've tried work fine when using the same credentials that I'm using with this POST. The error message I'm getting back is
Error: WRRCSH004: Service [1d111267-76b7-417a-98bd-4e9a58072ef9] is not authorized for cluster [sc262b05e8_dcf5_40b4_b662_ae85058ff07f]!. I don't know where the identifier (1d111267-76b7-417a-98bd-4e9a58072ef9) is coming from; that's not the userid I'm sending in.
Looking into your issue it appears your Bluemix organization has multiple service instances. The 403 issue you were seeing is because you're trying to access a Solr cluster using credentials from one of your instances against a cluster in the other instance. The 1d111267-76b7-417a-98bd-4e9a58072ef9 represents one of these service instances—but the issue is that the cluster you're trying to access is not part of that instance. A good way to test this is to ensure you're using the same credentials that generate the 403 but simply try to list the Solr clusters you have created by doing a GET against https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/.
As for the 500 issue, I wasn't able to see anything on our end. If you're still experiencing that I would suggest posting another question and we can look into things again.
Thanks,
-Scott
UPDATE: Please, if anyone can help: Google is waiting for inputs and examples of this problem on their bug tracking tool. If you have reproducible steps for this issue, please share them on: https://code.google.com/p/googleappengine/issues/detail?id=10937
I'm trying to fetch data from the StackExchange API using a Google App Engine backend. As you may know, some of StackExchange's APIs are site-specific, requiring developers to run queries against every site the user is registered in.
So, here's my backend code for fetching timeline data from these sites. The feed_info_site variable holds the StackExchange site name (such as 'security', 'serverfault', etc.).
data = json.loads(urllib.urlopen("%sme/timeline?%s" %
(self.API_BASE_URL, urllib.urlencode({"pagesize": 100,
"fromdate": se_since_timestamp, "filter": "!9WWBR
(nmw", "site": feed_info_site, "access_token":
decrypt(self.API_ACCESS_TOKEN_SECRET, self.access_token), "key":
self.API_APP_KEY}))).read())
for item in data['items']:
... # code for parsing timeline items
When running this query on all sites except Stack Overflow, everything works OK. What's weird is, when the feed_info_site variable is set to 'stackoverflow', I get the following error from Google App Engine:
HTTPException: Invalid and/or missing SSL certificate for URL:
https://api.stackexchange.com/2.2/me/timeline?
filter=%219WWBR%28nmw&access_token=
<ACCESS_TOKEN_REMOVED>&fromdate=1&pagesize=100&key=
<API_KEY_REMOVED>&site=stackoverflow
Of course, if I run the same query in Safari, I get the JSON results I'm expecting from the API. So the problem really lies in Google's URLfetch service. I found several topics here on Stack Overflow related to similar HTTPS/SSL exceptions, but no accepted answer solved my problems. I tried removing cacerts.txt files. I also tried making the call with validate_certificate=False, with no success.
I think the problem is not strictly related to HTTPS/SSL. If so, how would you explain that changing a single API parameter would make the request to fail?
Wait for the next update to the app engine (scheduled one soon) then update.
Replace browserid.org/verify with another service (verifier.loogin.persona.org/verify is a good service hosted by Mozilla what could be used)
Make sure cacerts.txt doesnt exist (looks like you have sorted but just in-case :-) )
Attempt again
Good luck!
-Brendan
I was facing the same error, google has updated the app engine now, error resolved, please check the updated docs.
I was able to get the Codelab example up and running but when I try switch to an existing file that opendatakit has pushed into my datastore it's saying it can't find it. I updated
ENTITY_KIND = 'main.ProductSalesData'
to
ENTITY_KIND = 'opendatakit.testl'
also tried ENTITY_KIND = 'main.opendatakit.testload1'
and updated class ProductSalesData(db.Model): to class testload1(db.Model):
but no luck. Still trying to get up to speed on everything but I think I'm missing something simple. Also feeling like the google documentation is pulling me in different directions. Is it a permissions issue or just a file naming convention.
Error message:
MapperPipeline
Aborted
Abort Message: Aborting after 3 attempts
Retry Message: BadReaderParamsError: Bad entity kind: Could not find 'testload1' on path 'opendatakit'
Just not sure how to point to an existng file.
Thanks