I'm just wondering if anyone else has seen this and if so, can you confirm that this is correct? The documentation claims, as you might expect, that 10,000 is the record limit for the system call:
Database.emptyRecycleBin(records);
not 200. Yet it's throwing an error at 200. The only thing I can think of is that this call occurs from within a batch Apex process.
It took a little over a week and me supplying a failing test case to salesforce support but the issue is now being reported as a salesforce known issue suggesting it may get addressed in the platform.
My workaround for now is to wrap the call in a Database.Batchable with the batch size to 200.
This is the only reference that I could find to there being a limit of 200 on emptyrecyclebin(), I dare say that you are correct
http://www.salesforce.com/us/developer/docs/api/Content/sforce_api_calls_emptyrecyclebin.htm
Adam, if you got shut down when attempting to log a case regarding this due to the whole Premier Support thing you should definitely escalate your case as it was handled incorrectly and SFDC needs to know about it. I had the same exact issue myself.
SOQL For Loops may be a helpful option for working around this limit as the 'for (Account[] accounts : [SELECT Id FROM Account WHERE IsDeleted = true ALL ROWS]' format provides batches of 200.
Related
Followed the instruction of http://docs.vespa.ai/documentation/vespa-quick-start.html and issued yql-like curl (curl -s http://localhost:8080/search/?yql=select%20%2A%20from%20sources%20%2A%3B), got the error msg as follows, "message": "Could not instantiate query from YQL", could anyone point out if I missed anything to start any service?
I want to store all the documents in physical memory for fast query, is there any configuration for me to achieve that? btw, is the doc compressed by default? (Also, I'd like to avoid disk io when feeding documents)
Appreciate if anyone could share some internal architecture design doc for content/search node, thanks.
////
1 works by comment #1.
I will let others respond to point 2 and 3, but my guess for point 1 is that you miss the "where" clause in your yql query, hence the failure.
3) We don't have updated design documentation, sorry about that. If you ask more concrete questions I can provide answers or find something for you (but I suppose you already did that in https://github.com/vespa-engine/vespa/issues/5434).
On 2) you can define your fields as attribute and have a custom document summary referencing only attribute fields. See http://docs.vespa.ai/documentation/attributes.html http://docs.vespa.ai/documentation/document-summaries.html &
I would like to aggregate exchanges, and when then exchange hits a certain size (say 20KB) I would like to mark the exchange as closed.
I have a rudimentary implementation that checks size of the exchange and if it is 18KB I return true from my predicate. However, if a messages comes in that is 4KB and it is currently 17KB that will mean I will complete the aggregation when it is 21KB which is too big.
Any ideas on how to solve this? Can I do something in the aggregation strategy to reject the join and start a new Exchange to aggregate on?
I figured I could put it through another process to check actual size remove messages off the end of the message to fit the size, and for each removed message, push them back through...but that seems a little ugly because I have a constantly compensating routine that would likely execute.
Thanks in advance for any tips.
I think there is an eager complete option you can use to mark it as complete when you have that 17 + 4 > 20 situation. Then it will complete the 17, and start a new group with the 4.
See the docs at: https://github.com/apache/camel/blob/master/camel-core/src/main/docs/eips/aggregate-eip.adoc
And you would also likely need to use `PreCompleteAggregationStrategy' and return true in that 17 + 4 > 20 situation, as otherwise it would group them together first and complete, eg so it becomes 21. But by using both the eager completion check option and this interface you can do as you want.
https://github.com/apache/camel/blob/master/camel-core/src/main/java/org/apache/camel/processor/aggregate/PreCompletionAwareAggregationStrategy.java
I've got a problem with the Gmail API and the resultSizeEstimate field of the messages.list method.
The value is completely wrong : if I query all of the messages of INBOX using https://www.googleapis.com/gmail/v1/users/me/messages?labelIds=INBOX, I got a resultSizeEstimate of 51, even if I can fetch more than 51 messages at once with maxResults.
The thing is that I've got more than 8k messages in my inbox, if I look in the Gmail GUI or using the labels.get.messagesTotal field.
That's not a problem if I want to have the total number of messages in a label since labels.get works fine, but how can I do to have the total number of messages of a specific query ? If I add a &q=... to my messages.list request, I still have a value of max results around 50, which is obviously wrong.
Thanks !
resultSizeEstimate is just that, an estimate. You should probably not use it if it has to be a very close estimate.
You can't get the total amount of results of a query. You have to list until there is no nextPageToken in the response. Then you know you have gotten every result of the query.
Edit: See my answer. Problem was in our code. MR works fine, it may have a status reporting problem, but at least the input readers work fine.
I ran an experiment several times now and I am now sure that mapreduce (or DatastoreInputReader) has odd behavior. I suspect this might have something to do with key ranges and splitting them, but that is just my guess.
Anyway, here's the setup we have:
we have an NDB model called "AdGroup", when creating new entities
of this model - we use the same id returned from AdWords (it's an
integer), but we use it as string: AdGroup(id=str(adgroupId))
we have 1,163,871 of these entities in our datastore (that's what
the "Datastore Admin" page tells us - I know it's not entirely
accurate number, but we don't create/delete adgroups very often, so
we can say for sure, that the number is 1.1 million or more).
mapreduce is started (from another pipeline) like this:
yield mapreduce_pipeline.MapreducePipeline(
job_name='AdGroup-process',
mapper_spec='process.adgroup_mapper',
reducer_spec='process.adgroup_reducer',
input_reader_spec='mapreduce.input_readers.DatastoreInputReader',
mapper_params={
'entity_kind': 'model.AdGroup',
'shard_count': 120,
'processing_rate': 500,
'batch_size': 20,
},
)
So, I've tried to run this mapreduce several times today without changing anything in the code and without making changes to the datastore. Every time I ran it, mapper-calls counter had a different value ranging from 450,000 to 550,000.
Correct me if I'm wrong, but considering that I use the very basic DatastoreInputReader - mapper-calls should be equal to the number of entities. So it should be 1.1 million or more.
Note: the reason why I noticed this issue in the first place is because our marketing guys started complaining that "it's been 4 days after we added new adgroups and they still don't show up in your app!".
Right now, I can think of only one workaround - write all keys of all adgroups into a blobstore file (one per line) and then use BlobstoreLineInputReader. The writing to blob part would have to be written in a way that does not utilize DatastoreInputReader, of course. Should I go with this for now, or can you suggest something better?
Note: I have also tried using DatastoreKeyInputReader with the same code - the results were similar - mapper-calls were between 450,000 and 550,000.
So, finally questions. Is it important how you generate ids for your entities? Is it better to use int ids instead of str ids? In general, what can I do to make it easier for mapreduce to find all of my entities mapping them?
PS: I'm still in the process of experimenting with this, I might add more details later.
After further investigation we have found that the error was actually in our code. So, mapreduce actually works as expected (mapper is called for every single datastore entity).
Our code was calling some google services functions that were sometimes failing (the wonderful cryptic ApplicationError messages). Due to these failures, MR tasks were being retried. However, we have set a limit on taskqueue retries. MR did not detect nor report this in any way - MR was still showing "success" in the status page for all shards. That is why we thought that everything is fine with our code and that there is something wrong with the input reader.
This is so weird...
First of all this query works in the datastore viewer, ie. it returns the correct row.
SELECT * FROM Level where short_id = 'Ec71eN'
But if I run this
Level.all().filter("short_id = ", 'Ec71eN').get()
it returns None, if I run this:
db.GqlQuery("SELECT * FROM Level where short_id = '%s'" % 'Ec71eN').get()
it also returns None. If I run this:
level = Level.get_by_id(189009)
it returns the correct row (189009 is the id for the correct row)
Puzzling? What can be wrong here? I have never seen anything like this before, it has worked correctly for at least a couple of weeks in production... I think I have at least two cases now where it dosent work starting today.
UPDATE: This can not be a eventually consistent problem since the row was 7 hours old when I tried the above. I had two rows with same symptoms, strangely booth generated by the same users. They where booth "fixed" after I did a manual fecth of their ids by uploading special case code like:
if short_id==CASE_1_SHORT_ID:
level = Level.get_by_id(CASE_1_ID)
After that the query worked as usual.
Are you using the HRD? Nothing's wrong. You know it's supposed to be eventually consistent right?
Query operations are eventually consistent.
Get-by-id operations are fully consistent.
What you describe is correct datastore behavior. It's a bit odd that the datastore viewer operation returns the correct result, but it might have hit a separate tablet on the datastore operation.
Given that it was created 7 hours ago, the 'eventual consistency' generally should take seconds to minutes.
If eventual consistency IS the problem, run the same query method a bunch of times and see if returns the same result. If it continuously returns the same result with the same method, then it is more than likely not an eventual consistency problem. You should switch to the NDB API for querying data as well - it's 1000 times better and Guido worked on it - so you know it's good. Does NDB show the same inconsistency?