How to keep large SOQL query cached in Salesforce Org. for speedy returns? - salesforce

We are having an issue where we need event relations for people(s), and are having problems with this very large group of people having almost 400 total event relations in this one week we are testing on... When trying to grab this large groups event relations, it will take forever and possibly time out. However, if you try again right after a timeout it goes in a couple seconds and is great. I was thinking this was salesforce just chaching the soql query/information and so it could act very quickly the second time. I tried to kind of trick it into having this query cached and ready by having a batch job that ran regularly to query every members event relations so when they tried to access our app the timeout issue would stop.
However, this is not even appearing to work. Even though the batch is running correctly and querying all these event relations, when you go to the app after a while without using it, it will still timeout or take very long the first time then be very quick after that.
Is there a way to successfully keep this cached so it will run very quickly when a user goes and tries to see all the event relations of a large group of people? With the developer console we saw that the event relation query was the huge time suck in the code and the real issue. I have been kind of looking into the Platform Cache of salesforce. Would storing this data there provide the solution I am looking for?

You should look into updating your query to be selective by using indexes in the where cause and custom indexes if necessary.

Related

Azure Search Updating Individual Documents

We are using Azure Search for various scenarios. We often need to update an individual document with changes that users make. We need to have these changes become visible in our indexes as soon as possible so that the stale time is as short as possible within reason.
What is the best strategy to handle this. We know that batch updates is the way to go but we need more immediate reflection of the changes.
Once a document is updated, how long does it take for the index to reflect this change.
Many thanks
If the updates are not very frequent, you can simply update the Azure Search document immediately (that is, using batches of size 1). On the other hand, if the updates are extremely frequent and you notice a high rate of failures with single-document batches, you will need to build some sort of "collector" mechanism to batch up updates. My recommendation would be to do the simple thing first: try single-document batches, and add batching logic if necessary.
Updated or newly indexed documents are reflected in the search results after a short delay, usually ranging from single milliseconds to 1-2 seconds. The delay depends on the service topology and indexing load.

Database time acces in Heroku with Play Framework

I am having a problem and I need your help.
I am working with Play Framework v1.2.4 in java, and my server is uploaded in the Heroku servers.
All works fine, I can access to my databases and all is ok, but I am experiment troubles when I do a couple of saves to the database.
I have a method who store data many times in the database and return a notification to a mobile phone. My problem is that the notification arrives before the database finish to save the data, because when it arrives I request for the update data to the server, and it returns the data without the last update. After a few seconds I have trying to update again, and the data shows correctly, therefore I think there is a time-access problem.
The idea would be that when the databases end to save the data, the server send the notification.
I dont know if this is caused because I am using the free version of the Heroku Servers, but I want to be sure before purchasing it.
In general all requests to cloud databases are always slower than the same working on your local machine. Even simply query that on your computer needs just 0.0001 sec can be as slow as 0.5 sec in the cloud. Reason is simple clouds providers uses shared databases + (geo) replications, which just... cannot be compared to the database accessed only by one program on the same machine.
Also keep in mind that free Heroku DB plans doesn't offer ANY database cache, which means that every query is fetched from the cloud directly.
As we don't know your application it's hard to say what is the bottleneck anyway almost for sure you have at least 3 ways to solve your problem. They are not an alternatives, probably you will need to use (or at least check) all of them.
You need to risk some basic plan and see how things changed with paid version, maybe it will be good enough for you, maybe not.
Redesign your application to make less queries. For an example instead sending 10 queries to select 10 different rows, you will need to send one query, which selects all 10 records at once.
Use Play's cache API to avoid repeating selecting the same set of data again and again. For an example, if you have some categories, which changes rarely, but you need category tree for each article, you don't need to fetch categories from DB every time, instead you can store a List of categories in cache, so you will need to use only one request to fetch article's content (which can be cached for some short time as well...)

What is the recommended way to build functionality similar to Stackoverflow's "Inbox"?

I have an asp.net-mvc website and people manage a list of projects. Based on some algorithm, I can tell if a project is out of date. When a user logs in, i want it to show the number of stale projects (similar to when i see a number of updates in the inbox).
The algorithm to calculate stale projects is kind of slow so if everytime a user logs in, i have to:
Run a query for all project where they are the owner
Run the IsStale() algorithm
Display the count where IsStale = true
My guess is that will be real slow. Also, on everything project write, i would have to recalculate the above to see if changed.
Another idea i had was to create a table and run a job everything minutes to calculate stale projects and store the latest count in this metrics table. Then just query that when users log in. The issue there is I still have to keep that table in sync and if it only recalcs once every minute, if people update projects, it won't change the value until after a minute.
Any idea for a fast, scalable way to support this inbox concept to alert users of number of items to review ??
The first step is always proper requirement analysis. Let's assume I'm a Project Manager. I log in to the system and it displays my only project as on time. A developer comes to my office an tells me there is a delay in his activity. I select the developer's activity and change its duration. The system still displays my project as on time, so I happily leave work.
How do you think I would feel if I receive a phone call at 3:00 AM from the client asking me for an explanation of why the project is no longer on time? Obviously, quite surprised, because the system didn't warn me in any way. Why did that happen? Because I had to wait 30 seconds (why not only 1 second?) for the next run of a scheduled job to update the project status.
That just can't be a solution. A warning must be sent immediately to the user, even if it takes 30 seconds to run the IsStale() process. Show the user a loading... image or anything else, but make sure the user has accurate data.
Now, regarding the implementation, nothing can be done to run away from the previous issue: you will have to run that process when something that affects some due date changes. However, what you can do is not unnecessarily run that process. For example, you mentioned that you could run it whenever the user logs in. What if 2 or more users log in and see the same project and don't change anything? It would be unnecessary to run the process twice.
Whatsmore, if you make sure the process is run when the user updates the project, you won't need to run the process at any other time. In conclusion, this schema has the following advantages and disadvantages compared to the "polling" solution:
Advantages
No scheduled job
No unneeded process runs (this is arguable because you could set a dirty flag on the project and only run it if it is true)
No unneeded queries of the dirty value
The user will always be informed of the current and real state of the project (which is by far, the most important item to address in any solution provided)
Disadvantages
If a user updates a project and then upates it again in a matter of seconds the process would be run twice (in the polling schema the process might not even be run once in that period, depending on the frequency it has been scheduled)
The user who updates the project will have to wait for the process to finish
Changing to how you implement the notification system in a similar way to StackOverflow, that's quite a different question. I guess you have a many-to-many relationship with users and projects. The simplest solution would be adding a single attribute to the relationship between those entities (the middle table):
Cardinalities: A user has many projects. A project has many users
That way when you run the process you should update each user's Has_pending_notifications with the new result. For example, if a user updates a project and it is no longer on time then you should set to true all users Has_pending_notifications field so that they're aware of the situation. Similarly, set it to false when the project is on time (I understand you just want to make sure the notifications are displayed when the project is no longer on time).
Taking StackOverflow's example, when a user reads a notification you should set the flag to false. Make sure you don't use timestamps to guess if a user has read a notification: logging in doesn't mean reading notifications.
Finally, if the notification itself is complex enough, you can move it away from the relationship between users and projects and go for something like this:
Cardinalities: A user has many projects. A project has many users. A user has many notifications. A notifications has one user. A project has many notifications. A notification has one project.
I hope something I've said has made sense, or give you some other better idea :)
You can do as follows:
To each user record add a datetime field sayng the last time the slow computation was done. Call it LastDate.
To each project add a boolean to say if it has to be listed. Call it: Selected
When you run the Slow procedure set you update the Selected fileds
Now when the user logs if LastDate is enough close to now you use the results of the last slow computation and just take all project with Selected true. Otherwise yourun again the slow computation.
The above procedure is optimal, becuase it re-compute the slow procedure ONLY IF ACTUALLY NEEDED, while running a procedure at fixed intervals of time...has the risk of wasting time because maybe the user will neber use the result of a computation.
Make a field "stale".
Run a SQL statement that updates stale=1 with all records where stale=0 AND (that algorithm returns true).
Then run a SQL statement that selects all records where stale=1.
The reason this will work fast is because SQL parsers, like PHP, shouldn't do the second half of the AND statement if the first half returns true, making it a very fast run through the whole list, checking all the records, trying to make them stale IF NOT already stale. If it's already stale, the algorithm won't be executed, saving you time. If it's not, the algorithm will be run to see if it's become stale, and then stale will be set to 1.
The second query then just returns all the stale records where stale=1.
You can do this:
In the database change the timestamp every time a project is accessed by the user.
When the user logs in, pull all their projects. Check the timestamp and compare it with with today's date, if it's older than n-days, add it to the stale list. I don't believe that comparing dates will result in any slow logic.
I think the fundamental questions need to be resolved before you think about databases and code. The primary of these is: "Why is IsStale() slow?"
From comments elsewhere it is clear that the concept that this is slow is non-negotiable. Is this computation out of your hands? Are the results resistant to caching? What level of change triggers the re-computation.
Having written scheduling systems in the past, there are two types of changes: those that can happen within the slack and those that cause cascading schedule changes. Likewise, there are two types of rebuilds: total and local. Total rebuilds are obvious; local rebuilds try to minimize "damage" to other scheduled resources.
Here is the crux of the matter: if you have total rebuild on every update, you could be looking at 30 minute lags from the time of the change to the time that the schedule is stable. (I'm basing this on my experience with an ERP system's rebuild time with a very complex workload).
If the reality of your system is that such tasks take 30 minutes, having a design goal of instant gratification for your users is contrary to the ground truth of the matter. However, you may be able to detect schedule inconsistency far faster than the rebuild. In that case you could show the user "schedule has been overrun, recomputing new end times" or something similar... but I suspect that if you have a lot of schedule changes being entered by different users at the same time the system would degrade into one continuous display of that notice. However, you at least gain the advantage that you could batch changes happening over a period of time for the next rebuild.
It is for this reason that most of the scheduling problems I have seen don't actually do real time re-computations. In the context of the ERP situation there is a schedule master who is responsible for the scheduling of the shop floor and any changes get funneled through them. The "master" schedule was regenerated prior to each shift (shifts were 12 hours, so twice a day) and during the shift delays were worked in via "local" modifications that did not shuffle the master schedule until the next 12 hour block.
In a much simpler situation (software design) the schedule was updated once a day in response to the day's progress reporting. Bad news was delivered during the next morning's scrum, along with the updated schedule.
Making a long story short, I'm thinking that perhaps this is an "unask the question" moment, where the assumption needs to be challenged. If the re-computation is large enough that continuous updates are impractical, then aligning expectations with reality is in order. Either the algorithm needs work (optimizing for local changes), the hardware farm needs expansion or the timing of expectations of "truth" needs to be recalibrated.
A more refined answer would frankly require more details than "just assume an expensive process" because the proper points of attack on that process are impossible to know.

Database design, real time or batch processing

I am facing two options of how to update the database, and do not know which one is better for my situation. There are three tables in the database, which are used to read/store some user's information, such the url history or some inputs.
In real time, the database is accessible by users all the time, so the changes made to the database can be seen immediately by that user.
The batch processing hides the "update" from user, database is updated by parsing the log files, and such a process runs every X hours. So user can only see their changes after X hours.
Apart from the advantage/disadvantage of synchronized/asynchronized updates that user can see. What are the other benefits of choosing real-time or batch processing updating methods for database updating?
Thanks
It all depends on the amount of traffic you expect. If you want to scale your application, asynchronous processing is always recommended. But that does not mean that your users have to wait for X hours. You can have the process run every 5 minutes or even every minute.
This way you will reduce concurrency issues and at the same time users will be able to see their updated history with a little bit of delay.
See best practices for scalability in the book Scalability Rules
I would suggest you use EDA (Event Driven Architecture) which uses a middleware to
"glue" all of this.
http://searchsoa.techtarget.com/definition/event-driven-architecture
One advice : Keep away from batch processes.
Today, everything tends to be more and more real-time. Imagine if you would receive my answer in X hours... would you be satisfied? :)
If you give us more Info, we could also help you more.
I see that your input comes from a log file? Can this be changed?
You could also implement the observer pattern.

displaying # views on a page without hitting database all the time

More and more sites are displaying the number of views (and clicks like on dzone.com) certain pages receive. What is the best practice for keeping track of view #'s without hitting the database every load?
I have a bunch of potential ideas on how to do this in my head but none of them seem viable.
Thanks,
first time user.
I would try the database approach first - returning the value of an autoincrement counter should be a fairly cheap operation so you might be surprised. Even keeping a table of many items on which to record the hit count should be fairly performant.
But the question was how to avoid hitting the db every call. I'd suggest loading the table into the webapp and incrementing it there, only backing it up to the db periodically or on webapp shutdown.
One cheap trick would be to simply cache the value for a few minutes.
The exact number of views doesn't matter much anyway since, on a busy site, in the time a visitor goes through the page, a whole batch of new views is already in.
One way is to use memcached as a counter. You could modify this rate limit implementation to instead act as general counter. The key could be in yyyymmddhhmm format with an expiration of 15 or 30 minutes (depending on what you consider to be concurrent visitors) and then simply get those keys when displaying the page.
Nice libraries for communicating with the memcache server are available in many languages.
You could set up a flat file that has the number of hits in it. This would have issues scaling, but it could work.
If you don't care about displaying the number of page views, you could use something like google analytics or piwik. Both make requests after the page is already loaded, so it won't impact load times. There might be a way to make a ajax request to the analytics server, but I don't know for sure. Piwik is opensource, so you can probably hack something together.
If you are using server side scripting, increment it in a variable. It's likely to get reset if you restart the services so not such a good idea if accuracy is needed.

Resources