Saleforce SOQL query - Jersey Readtimeout error - salesforce

I'm having a problem on a batch job that has a simple SOQL query that returns a lot of records. More than a million.
The query, as it is, cannot be optimized much further according to SOQL best practices. (At least, as far as I know. I'm not an SF SOQL expert.)
The problem is that I'm getting -
Caused by: javax.ws.rs.ProcessingException: java.net.SocketTimeoutException: Read timed out
I try bumping up the Jersey readtime out value from 30 seconds to 60 seconds, but it still times out.
Any recommendation on how to deal with this issue? Any recommended value for the readtimeout parameter for a query that returns that much data?
The query is like this:
SELECT Id, field1, field2__c, field3__c, field3__c FROM Object__c
WHERE field2__c = true AND (not field3 like '\u0025Some string\u0025')
ORDER BY field4__c ASC

In no specific order...
Batches written in Apex time out after 2 minutes so maybe set same in your Java application
Run your query in Developer Console using the query plan feature (you probably will have to put real % in there, not \u0025). Pay attention which part has "Cost" column > 1.
what are field types? Plain checkbox and text or some complex formulas?
Is that text static or changes depending on what your app needs? would you consider filtering out the string in your code rather than SOQL? Counter-intuitive to return more records than you really need but well, might be an option.
would you consider making a formula field with either whole logic or just the string search and then asking SF to index the formula. Or maybe making another field (another checkbox?) with "yes, it contains that text" info, set the value by workflow maybe (essentially prepare your data a bit to efficiently query it later)
read up about skinny tables and see if it's something that could work for you (needs SF support)
can you make an analytic snapshot of your data (make a report, make SF save results to helper object, query that object)? Even if it'd just contain lookups to your original source so you'll access always fresh values it could help. Might be a storage killer though
have you considered "big objects" and async soql
I'm not proud of it but in the past I had some success badgering the SF database. Not via API but if I had a nightly batch job that was timing out I kept resubmitting it and eventually 3rd-5th time it managed to start. Something in the query optimizer, creation of cursor in underlying Oracle database, caching partial results... I don't know.
what's in the ORDER BY? Some date field? If you need records updated since X first then maybe replication API could help getting ids first.
does it make sense to use LIMIT 200 for example? Which API you're using, SOAP or REST? Might be that returning smaller chunks (SOAP: batch size, REST API: special header) would help it finish faster.
when all else fails (but do contact SF support, make sure you exhausted the options) maybe restructure the whole thing. Make SF push data to you whenever it changes, not pull. There's "Streaming API" (CometD implementation, Bayeux protocol, however these are called) and "Change Data Capture" and "Platform Events" for nice event bus-driven architecture decisions, replaying old events up to 3 days back if the client was down and couldn't listen... But that's a totally different topic.

Related

Saleforce Admin Sharing Rule

[Note: There is a Teacher Object with the fields such as Teacher Name, DateofJoining, and also a formula field called Experience]
My Task was to create a Public Group consisting of another user
and this user should only see teachers who have experience greater than 2 years
But when i create a sharing rule based on criteria the field name called Experience doesn't show up as it is a formula field.
So i got an idea of creating a new field(maybe a text or number data type) which would have the value of Experience in it. (But i have no idea on how to implement this)
Is there a way to implement this?
Any other solution is also well appreciated!
Hard to say.
Normal trick would be to create a helper field (text, number, whatever) and have piece of functionality that populates it. An "early flow" or "before insert, before update" trigger ideally. Worst case a normal flow, process builder or "after insert, after update" trigger. Something like "if Experience__c != 'your formula here' then Experience__c = 'your formula here'". Consult normal SF help and trailhead if you never used early flows
You'd make an one-off data fix to populate existing records and job done, normal field should be selectable as sharing rule criteria.
=====
But I smell trouble with your formula. What exactly you have there, something like Experience__c = (TODAY() - DateofJoining__c) / 365? That's bit evil. Formulas with TODAY(), NOW() or anything with $ (roughly speaking who's looking at the data, user's name, profile role... not what's actually on the record itself) are "nondeterministic". Unpredictable.
A "today()" changes just like that, without updating the record. Sure, when you watch the record a fresh value will be calculated but other than that LastModifiedDate doesn't change, there's no magical trigger running at midnight that rechecks sharing. (especially that there's no single midnight, you could have users in multiple timezones). SF just doesn't allow nondeterministic fields in many places, see https://salesforce.stackexchange.com/q/32122/799
So if you do rely on TODAY() in your formula you might have to make a "scheduled flow" or read about schedulable, batchable apex. Create nightly job that would run and recalculate your helper field with right experience. You'd probably even need both solutions, a "before save" flow for new data created today and nightly job to advance the clock on existing old data...

How to Implement Stateful Polling SQL Consumer in Camel?

I've been reading around the forums and documentation, and I can't seem to find anything related to what I am looking for, which is a huge surprise to me as it would seem to be a common requirement, so I suspect that there is a better way of approaching this.
I have a database, which I want to run a SQL Consumer on, and I want to query only records that have been modified since the last time I queried.
It appears that you cannot parameterise a SQL Consumer query, which would seem to be the first hurdle, and secondly, even if I could parameterise the consumer query, I don't appear to be able to store the result between one query and the next.
My assumption is that I would want to store the highest dateModified value, and subsequently query records where the dateModified value is strictly greater than the stored value.
(I realise that this is not foolproof, as there could be millisecond issues, but I can't think of another way of achieving this without changing the application or database.)
The only way I can see of using a SQL Consumer is to store the highest dateModified in a custom table in the system database (which I would rather not change) and include some sort of
WHERE dateModified > interfaceDataTable.lastDateModified
in the SQL Query, and an
UPDATE interfaceDataTable SET lastDateModified = :#$latestDateModifiedValue
in the onConsume SQL.
However, I'd much rather not make any changes to the source database, as that will have further implications for testing etc.
I have the sense I'm barking up the wrong tree here. Is there a better way of approaching this?
Yes this is currently not supported in camel-sql to have it dynamic parameters, such as calling a java bean method etc.
I have logged a ticket to see if we can implement this: https://issues.apache.org/jira/browse/CAMEL-12734

Searching a nvarchar(max) field

Our application connects to a SQL Server database. There is a column that is nvarchar(max) that has been added an must be included in the search. The number of records in the this DB is only in the 10s of thousands and there are only a few hundred people using the application. I'm told to explore Full Text Search, is this necessary?
This is like asking, I work 5 miles away, and I was told to consider buying a car. Is this necessary? Too many variables to give you a simple and correct answer to your question. For example, is it a nice walk? Is there public transit available? Is your schedule flexible? Do you have to go far for lunch or run errands after work?
Full-Text Search can help if your typical searches are going to be WHERE col LIKE '%foo%' - but whether it is necessary depends on how large this column will get, whether your searches are true wildcard searches, your tolerance for millisecond vs. nanosecond queries, the amount of concurrency, even seemingly extraneous stuff like whether the data is always in memory and can be searched more efficiently.
The better answer is that you should try it. Populate a table with a copy of your data, add a full-text index, and see if your typical workload improves by using full-text queries instead of LIKE. It probably will, but there's no way for us to know for sure even if you add more specifics than ballpark row counts.
In a similar situation I ended up making a table structure that was more search friendly and indexable, then setting up a batch job to copy records from the live database to the reporting one.
In my case the original data didn't come close to needing an nvarchar(max) column so I could get away with that. Your mileage may vary. In any case, the answer is "try a few things and see what works for you".

How to efficiently check if a result set changed and serve it to a web application for syndication

Here is the scenario:
I am handling a SQL Server database with a stored procedure which takes care of returning headers for Web feed items (RSS/Atom) I am serving as feeds through a web application.
This stored procedure should, when called by the service broker task running at a given interval, verify if there has been a significant change in the underlying data - in that case, it will trigger a resource intensive activity of formatting the feed item header through a call to the web application which will get/retrieve the data, format them and return to the SQL database.
There the header would be stored ready for a request for RSS feed update from the client.
Now, trying to design this to be as efficient as possible, I still have a couple of turning point I'd like to get your suggestions about.
My tentative approach at the stored procedure would be:
get together the data in a in-memory table,
create a subquery with the signature columns which change with the information,
convert them to XML with a FOR XML AUTO
hash the result with MD5 (with HASHBYTES or fn_repl_hash_binary depending on the size of the result)
verify if the hash matches with the one stored in the table where I am storing the HTML waiting for the feed requests.
if Hash matches do nothing otherwise proceed for the updates.
The first doubt is the best way to check if the base data have changed.
Converting to XML inflates significantly the data -which slows hashing-, and potentially I am not using the result apart from hashing: is there any better way to perform the check or to pack all the data together for hashing (something csv-like)?
The query is merging and aggregating data from multiple tables, so would not rely on table timestamps as their change is not necessarily related to a change in the result set
The second point is: what is the best way to serve the data to the webapp for reformatting?
- I might push the data through a CLR function to the web application to get data formatted (but this is synchronous and for multiple feed item would create unsustainable delay)
or
I might instead save the result set instead and trigger multiple asynchronous calls through the service broker. The web app might retrieve the data stored in some way instead of running again the expensive query which got them.
Since I have different formats depending on the feed item category, I cannot use the same table format - so storing to a table is going to be hard.
I might serialize to XML instead.
But is this going to provide any significant gain compared to re-running the query?
For the efficient caching bit, have a look at query notifications. The tricky bit in implementing this in your case is you've stated "significant change" whereas query notifications will trigger on any change. But the basic idea is that your application subscribes to a query. When the results of that query change, a message is sent to the application and it does whatever it is programmed to do (typically refreshing cached data).
As for serving the data to your app, there's a saying in the business: "don't go borrowing trouble". Which is to say if the default method of serving data (i.e. a result set w/o fancy formatting) isn't causing you a problem, don't change it. Change it only if and when it's causing you a significant enough headache that your time is best spent there.

Can I do transactions and locks in CouchDB?

I need to do transactions (begin, commit or rollback), locks (select for update).
How can I do it in a document model db?
Edit:
The case is this:
I want to run an auctions site.
And I think how to direct purchase as well.
In a direct purchase I have to decrement the quantity field in the item record, but only if the quantity is greater than zero. That is why I need locks and transactions.
I don't know how to address that without locks and/or transactions.
Can I solve this with CouchDB?
No. CouchDB uses an "optimistic concurrency" model. In the simplest terms, this just means that you send a document version along with your update, and CouchDB rejects the change if the current document version doesn't match what you've sent.
It's deceptively simple, really. You can reframe many normal transaction based scenarios for CouchDB. You do need to sort of throw out your RDBMS domain knowledge when learning CouchDB, though. It's helpful to approach problems from a higher level, rather than attempting to mold Couch to a SQL based world.
Keeping track of inventory
The problem you outlined is primarily an inventory issue. If you have a document describing an item, and it includes a field for "quantity available", you can handle concurrency issues like this:
Retrieve the document, take note of the _rev property that CouchDB sends along
Decrement the quantity field, if it's greater than zero
Send the updated document back, using the _rev property
If the _rev matches the currently stored number, be done!
If there's a conflict (when _rev doesn't match), retrieve the newest document version
In this instance, there are two possible failure scenarios to think about. If the most recent document version has a quantity of 0, you handle it just like you would in a RDBMS and alert the user that they can't actually buy what they wanted to purchase. If the most recent document version has a quantity greater than 0, you simply repeat the operation with the updated data, and start back at the beginning. This forces you to do a bit more work than an RDBMS would, and could get a little annoying if there are frequent, conflicting updates.
Now, the answer I just gave presupposes that you're going to do things in CouchDB in much the same way that you would in an RDBMS. I might approach this problem a bit differently:
I'd start with a "master product" document that includes all the descriptor data (name, picture, description, price, etc). Then I'd add an "inventory ticket" document for each specific instance, with fields for product_key and claimed_by. If you're selling a model of hammer, and have 20 of them to sell, you might have documents with keys like hammer-1, hammer-2, etc, to represent each available hammer.
Then, I'd create a view that gives me a list of available hammers, with a reduce function that lets me see a "total". These are completely off the cuff, but should give you an idea of what a working view would look like.
Map
function(doc)
{
if (doc.type == 'inventory_ticket' && doc.claimed_by == null ) {
emit(doc.product_key, { 'inventory_ticket' :doc.id, '_rev' : doc._rev });
}
}
This gives me a list of available "tickets", by product key. I could grab a group of these when someone wants to buy a hammer, then iterate through sending updates (using the id and _rev) until I successfully claim one (previously claimed tickets will result in an update error).
Reduce
function (keys, values, combine) {
return values.length;
}
This reduce function simply returns the total number of unclaimed inventory_ticket items, so you can tell how many "hammers" are available for purchase.
Caveats
This solution represents roughly 3.5 minutes of total thinking for the particular problem you've presented. There may be better ways of doing this! That said, it does substantially reduce conflicting updates, and cuts down on the need to respond to a conflict with a new update. Under this model, you won't have multiple users attempting to change data in primary product entry. At the very worst, you'll have multiple users attempting to claim a single ticket, and if you've grabbed several of those from your view, you simply move on to the next ticket and try again.
Reference: https://wiki.apache.org/couchdb/Frequently_asked_questions#How_do_I_use_transactions_with_CouchDB.3F
Expanding on MrKurt's answer. For lots of scenarios you don't need to have stock tickets redeemed in order. Instead of selecting the first ticket, you can select randomly from the remaining tickets. Given a large number tickets and a large number of concurrent requests, you will get much reduced contention on those tickets, versus everyone trying to get the first ticket.
A design pattern for restfull transactions is to create a "tension" in the system. For the popular example use case of a bank account transaction you must ensure to update the total for both involved accounts:
Create a transaction document "transfer USD 10 from account 11223 to account 88733". This creates the tension in the system.
To resolve any tension scan for all transaction documents and
If the source account is not updated yet update the source account (-10 USD)
If the source account was updated but the transaction document does not show this then update the transaction document (e.g. set flag "sourcedone" in the document)
If the target account is not updated yet update the target account (+10 USD)
If the target account was updated but the transaction document does not show this then update the transaction document
If both accounts have been updated you can delete the transaction document or keep it for auditing.
The scanning for tension should be done in a backend process for all "tension documents" to keep the times of tension in the system short. In the above example there will be a short time anticipated inconsistence when the first account has been updated but the second is not updated yet. This must be taken into account the same way you'll deal with eventual consistency if your Couchdb is distributed.
Another possible implementation avoids the need for transactions completely: just store the tension documents and evaluate the state of your system by evaluating every involved tension document. In the example above this would mean that the total for a account is only determined as the sum values in the transaction documents where this account is involved. In Couchdb you can model this very nicely as a map/reduce view.
No, CouchDB is not generally suitable for transactional applications because it doesn't support atomic operations in a clustered/replicated environment.
CouchDB sacrificed transactional capability in favor of scalability. In order to have atomic operations you need a central coordination system, which limits your scalability.
If you can guarantee you only have one CouchDB instance or that everyone modifying a particular document connects to the same CouchDB instance then you could use the conflict detection system to create a sort of atomicity using methods described above but if you later scale up to a cluster or use a hosted service like Cloudant it will break down and you'll have to redo that part of the system.
So, my suggestion would be to use something other than CouchDB for your account balances, it will be much easier that way.
As a response to the OP's problem, Couch is probably not the best choice here. Using views is a great way to keep track of inventory, but clamping to 0 is more or less impossible. The problem being the race condition when you read the result of a view, decide you're ok to use a "hammer-1" item, and then write a doc to use it. The problem is that there's no atomic way to only write the doc to use the hammer if the result of the view is that there are > 0 hammer-1's. If 100 users all query the view at the same time and see 1 hammer-1, they can all write a doc to use a hammer 1, resulting in -99 hammer-1's. In practice, the race condition will be fairly small - really small if your DB is running localhost. But once you scale, and have an off site DB server or cluster, the problem will get much more noticeable. Regardless, it's unacceptable to have a race condition of that sort in a critical - money related system.
An update to MrKurt's response (it may just be dated, or he may have been unaware of some CouchDB features)
A view is a good way to handle things like balances / inventories in CouchDB.
You don't need to emit the docid and rev in a view. You get both of those for free when you retrieve view results. Emitting them - especially in a verbose format like a dictionary - will just grow your view unnecessarily large.
A simple view for tracking inventory balances should look more like this (also off the top of my head)
function( doc )
{
if( doc.InventoryChange != undefined ) {
for( product_key in doc.InventoryChange ) {
emit( product_key, 1 );
}
}
}
And the reduce function is even more simple
_sum
This uses a built in reduce function that just sums the values of all rows with matching keys.
In this view, any doc can have a member "InventoryChange" that maps product_key's to a change in the total inventory of them. ie.
{
"_id": "abc123",
"InventoryChange": {
"hammer_1234": 10,
"saw_4321": 25
}
}
Would add 10 hammer_1234's and 25 saw_4321's.
{
"_id": "def456",
"InventoryChange": {
"hammer_1234": -5
}
}
Would burn 5 hammers from the inventory.
With this model, you're never updating any data, only appending. This means there's no opportunity for update conflicts. All the transactional issues of updating data go away :)
Another nice thing about this model is that ANY document in the DB can both add and subtract items from the inventory. These documents can have all kinds of other data in them. You might have a "Shipment" document with a bunch of data about the date and time received, warehouse, receiving employee etc. and as long as that doc defines an InventoryChange, it'll update the inventory. As could a "Sale" doc, and a "DamagedItem" doc etc. Looking at each document, they read very clearly. And the view handles all the hard work.
Actually, you can in a way. Have a look at the HTTP Document API and scroll down to the heading "Modify Multiple Documents With a Single Request".
Basically you can create/update/delete a bunch of documents in a single post request to URI /{dbname}/_bulk_docs and they will either all succeed or all fail. The document does caution that this behaviour may change in the future, though.
EDIT: As predicted, from version 0.9 the bulk docs no longer works this way.
Just use SQlite kind of lightweight solution for transactions, and when the transaction is completed successfully replicate it, and mark it replicated in SQLite
SQLite table
txn_id , txn_attribute1, txn_attribute2,......,txn_status
dhwdhwu$sg1 x y added/replicated
You can also delete the transactions which are replicated successfully.

Resources