cross-group transaction error with only one entity - google-app-engine

I'm trying to execute the code below. Some times it works fine. But some times it does not work.
#db.transactional
def _add_data_to_site(self, key):
site = models.Site.get_by_key_name('s:%s' % self.site_id)
if not site:
site = models.Site()
if key not in site.data:
site.data.append(key)
site.put()
memcache.delete_multi(['', ':0', ':1'], key_prefix='s%s' %
self.site_id)
I'm getting the error:
File "/base/data/home/apps/xxxxxxx/1-7-1.366398694339889874xxxxxxx.py", line 91, in _add_data_to_site
site.put()
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/__init__.py", line 1070, in put
return datastore.Put(self._entity, **kwargs)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 579, in Put
return PutAsync(entities, **kwargs).get_result()
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 604, in get_result
return self.__get_result_hook(self)
File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1569, in __put_hook
self.check_rpc_success(rpc)
File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1224, in check_rpc_success
raise _ToDatastoreError(err)
BadRequestError: cross-group transaction need to be explicitly specified, see TransactionOptions.Builder.withXG
So, my question is:
If I'm changing only one entity (models.Site) why am I getting a cross-group transaction error?

As mentioned in the logs: "Cross-group transaction need to be explicitly specified".
Try specifying it by using
#db.transactional(xg=True)
Instead of:
#db.transactional

Does this work if you specify parent=None in your get_by_key_name() query?
Essentially, in order to use a transaction, all entities in the transaction must share the same parent (ie you query using one parent, and create a new entity withe the same parent), or you must use a XG transaction. You're seeing a problem because you didn't specify a parent.
You may need to create artificial entities to behave as parents in order to do what you're trying to do.

I had the same issue. By stepping through the client code, I made the following two observations:
1) Setting a parent of (None) seems to still indicate a parent of that kind, even if there's no specific record elected as that parent.
2) Your transaction will include all ReferenceProperty properties as well.
Therefore, you should, theoretically, get the cross-group transaction exception if you haven't declared a parent (by either omitting or setting to (None)) on any of the kinds that you're affecting if there's at least two (because if you're using kind A and kind B, it looks like you're using two different entity groups, for A records and for B records), -as well as- any of the kinds referred-to by any ReferenceProperty properties.
To fix this, you must create, at least, a kind without any properties, that can be set as the parent of all of your previously no-parent records, as well as the parent of all ReferenceProperty properties that they declare.
If that's not sufficient, then set the flag for the cross-group transaction.
Also, the text of the exception, for me, was: "cross-groups transaction need to be explicitly specified" (plural "groups"). I have version 1.7.6 of the Python AppEngine client.
Please upvote this answer if it fits your scenario.

A cross group transaction error refers to the entity groups being used, not the kind used (here Site).
When it occurs, it's because you are attempting a transaction on entities with different parents, hence putting them in different entity groups.
SHAMELESS PLUG:
You should stop using db and move your code to ndb, especially since it seems you're in the development phase.

Related

Debezium error, schema isn't known to this connector

I have a project using Debezium, mostly based on this example, which is then connected to an Apache Pulsar.
I have changed a few configurations. The file now looks like this:
database.history=io.debezium.relational.history.MemoryDatabaseHistory
connector.class=io.debezium.connector.mysql.MySqlConnector
offset.storage=org.apache.kafka.connect.storage.FileOffsetBackingStore
offset.storage.file.filename=offset.dat
offset.flush.interval.ms=5000
name=mysql-dbz-connector
database.hostname={ip}
database.port=3308
database.user={user}
database.password={pass}
database.dbname=database
database.server.name=test
table.whitelist=database.history_table,database.project_table
snapshot.mode=schema_only
schemas.enable=false
include.schema.changes=false
pulsar.topic=persistent://public/default/{0}
pulsar.broker.address=pulsar://{ip}:6650
database.history=io.debezium.relational.history.MemoryDatabaseHistory
As you may understand, what I'm trying to do is to monitor the history_table and the project_table modifications from the database and then write payloads onto an Apache Pulsar.
My problem is as follows. In whatever snapshot mode I use, when an offset has been written, I can't restart the Debezium without getting an error on the next database update.
Encountered change event for table database.history_table whose schema isn't known to this connector
It only happens with an existing offset.dat file. I think this is because the schema is null within the offset.dat file. Take this one for example:
¨Ìsrjava.util.HashMap⁄¡√`—F
loadFactorI thresholdxp?#wur[B¨Û¯T‡xpG{"schema":null,"payload":["mysql-dbz-connector",{"server":"test"}]}uq~U{"ts_sec":1563802215,"file":"database-bin.000005","pos":79574,"server_id":1,"event":1}x
I first suspected the schemas.enable=false or the include.schema.changes=false parameters that I used to make the JSON more concise, but their values don't change anything in the offset.dat file.
The problem lies in line database.history=io.debezium.relational.history.MemoryDatabaseHistory. The history will not survive restart. You should use FileDatabaseHistory instead of it.

Batch collecting in Solr PostFilter

What I want to do is exactly like this:
Solr: How to perform a batch request to an external system from a PostFilter?
and the approach I took is similar:
-don't call super.collect(docId) in the collect method of the PostFilter but store all docIds in an internal map
-call the external system in the finish() then call super.collect(docId) for all the docs that pass the external filtering
The problem I have: docId exceeds maxDoc "(docID must be >= 0 and < maxDoc=100000 (got docID=123456)"
I suspect I am storing local docIds and when Reader is changed, docBase is also changed so the global docId, which I believe is constructed in super.collect(docId) using the parameter docId and docBase, becomes incorrect. I've tried storing super.delegate.getLeafCollector(context) along with docId and call super.delegate.getLeafCollector(context).collect() instead of super.collect() but this doesn't work either (got a null pointer exception)
Look at the code for the CollapsingQParserPlugin in the Solr codebase, particularly CollapsingScoreCollector.finish.
The docId's you receive in the collect call are not globally unique. The Collapsing collector makes them unique by adding the docBase from the context to the local docId to create a globalDoc during the collect() phase.
Then in the finish() phase, you must find the context containing the doc in question and set the reader/leafDelegate depending on what version of Solr your running. Specifying the right docId with the wrong context will throw Exceptions. For the Collapsing collector, you iterate through the contexts until you find the first docBase smaller than the globalDoc.
Finally, if you added docBase in collect(), don't forget to subtract docBase in finish() when you call collect() on the appropriate DelegationCollector object, as the author may or may not have done the first time.

BadArgumentError: _MultiQuery with cursors requires __key__ order in ndb

I can't understand what this error means and apparently, no one ever got the same error on the internet
BadArgumentError: _MultiQuery with cursors requires __key__ order
This happens here:
return SocialNotification.query().order(-SocialNotification.date).filter(SocialNotification.source_key.IN(nodes_list)).fetch_page(10)
The property source_key is obviously a key and nodes_list is a list of entity keys previously retrieved.
What I need is to find all the SocialNotifications that have a field source_key that match one of the keys in the list.
The error message tries to tell you you that queries involving IN and cursors must be ordered by __key__ (which is the internal name for the key of the entity). (This is needed so that the results can be properly merged and made unique.) In this case you have to replace your .order() call with .order(SocialNotification._key).
It seems that this also happens when you filter for an inequality and try to fetch a page.
(e.g. MyModel.query(MyModel.prop != 'value').fetch_page(...) . This basically means (unless i missed something) that you can't fetch_page when using an inequality filter because on one hand you need the sort to be MyModel.prop but on the other hand you need it to be MyModel._key, which is hard :)
I found the answer here: https://developers.google.com/appengine/docs/python/ndb/queries#cursors
You can change your query to:
SocialNotification.query().order(-SocialNotification.date, SocialNotification.key).filter(SocialNotification.source_key.IN(nodes_list)).fetch_page(10)
in order to get this to work. Note that it seems to be slow (18 seconds) when nodes_list is large (1000 entities), at least on the Development server. I don't have a large amount of test
data on a test server.
You need the property you want to order on and key.
.order(-SocialNotification.date, SocialNotification.key)
I had the same error when filtering without a group.
The error occurred every time my filter returned more than one result.
To fix it I actually had to add ordering by key.

How to aggregate files in Mule ESB CE

I need to aggregate a number of csv inbound files in-memory, if necessary resequencing them, on Mule ESB CE 3.2.1.
How could I implement this kind of logics?
I tried with message-chunking-aggregator-router, but it fails on startup because xsd schema does not admit such a configuration:
<message-chunking-aggregator-router timeout="20000" failOnTimeout="false" >
<expression-message-info-mapping correlationIdExpression="#[header:correlation]"/>
</message-chunking-aggregator-router>
I've also tried to attach mine correlation ids to inbound messages, then process them by a custom-aggregator, but I've found that Mule internally uses a key made up of:
Serializable key=event.getId()+event.getMessage().getCorrelationSequence();//EventGroup:264
The internal id is everytime different (also if correlation sequence is correct): this way, Mule does not use only correlation sequence as I expected and same message is processed many times.
Finally, I can re-write a custom aggregator, but I would like to use a more consolidated technique.
Thanks in advance,
Gabriele
UPDATE
I've tried with message-chunk-aggregator but it doesn't fit my requisite, as it admits duplicates.
I try to detail the scenario I need to cover:
Mule polls (on a SFTP location)
file 1 "FIXEDPREFIX_1_of_2.zip" is detected and kept in memory somewhere (as an open SFTPStream, it's ok).
Some correlation info are mantained for grouping: group, sequence, group size.
file 1 "FIXEDPREFIX_1_of_2.zip" is detected again, but cannot be inserted because would be duplicated
file 2 "FIXEDPREFIX_2_of_2.zip" is detected, and correctly added
stated that group size has been reached, Mule routes MessageCollection with the correct set of messages
About point 2., I'm lucky enough to get info from filename and put them into MuleMessage::correlation* properties, so that subsequent components could use them.
I did, but duplicates are processed the same.
Thanks again
Gabriele
Here is the right router to use with Mule 3: http://www.mulesoft.org/documentation/display/MULE3USER/Routing+Message+Processors#RoutingMessageProcessors-MessageChunkAggregator

Google App Engine timeout: The datastore operation timed out, or the data was temporarily unavailable

This is a common exception I'm getting on my application's log daily, usually 5/6 times a day with a traffic of 1K visits/day:
db error trying to store stats
Traceback (most recent call last):
File "/base/data/home/apps/stackprinter/1b.347728306076327132/app/utility/worker.py", line 36, in deferred_store_print_statistics
dbcounter.increment()
File "/base/data/home/apps/stackprinter/1b.347728306076327132/app/db/counter.py", line 28, in increment
db.run_in_transaction(txn)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 1981, in RunInTransaction
DEFAULT_TRANSACTION_RETRIES, function, *args, **kwargs)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 2067, in RunInTransactionCustomRetries
ok, result = _DoOneTry(new_connection, function, args, kwargs)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 2105, in _DoOneTry
if new_connection.commit():
File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1585, in commit
return rpc.get_result()
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 530, in get_result
return self.__get_result_hook(self)
File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1613, in __commit_hook
raise _ToDatastoreError(err)
Timeout: The datastore operation timed out, or the data was temporarily unavailable.
The function that is raising the exception above is the following one:
def store_printed_question(question_id, service, title):
def _store_TX():
entity = Question.get_by_key_name(key_names = '%s_%s' % \
(question_id, service ) )
if entity:
entity.counter = entity.counter + 1
entity.put()
else:
Question(key_name = '%s_%s' % (question_id, service ),\
question_id ,\
service,\
title,\
counter = 1).put()
db.run_in_transaction(_store_TX)
Basically, the store_printed_question function check if a given question was previously printed, incrementing in that case the related counter in a single transaction.
This function is added by a WebHandler to a deferred worker using the predefined default queue that, as you might know, has a throughput rate of five task invocations per second.
On a entity with six attributes (two indexes) I thought that using transactions regulated by a deferred task rate limit would allow me to avoid datastore timeouts but, looking at the log, this error is still showing up daily.
This counter I'm storing is not so much important, so I'm not worried about getting these timeouts; anyway I'm curious why Google App Engine can't handle this task properly even at a low rate like 5 tasks per second and if lowering the rate could be a possible solution.
A sharded counter on each question to avoid timeouts would be an overkill to me.
EDIT:
I have set the rate limit to 1 task per second on the default queue; I'm still getting the same error.
A query can only live for 30 seconds. See my answer to this question for some sample code to break a query up using cursors.
Generally speaking, a timeout like this is usually because of write contention. If you've got a transaction going and you're writing a bunch of stuff to the same entity group concurrently, you run into write contention issues (a side effect of optimistic concurrency). In most cases, if you make your entity group smaller, that will usually minimize this problem.
In your specific case, based on the code above, it's most probably because you should be using a sharded counter to avoid stacking of serialized writes.
Another far less likely possibility (mentioned here only for completeness) is that the tablet your data is on is being moved.

Resources