Caching in CakePHP based on a per-record expiry date - cakephp

I've run into a bit of a dilemma with Cake 2.4's caching system and I was wondering if anyone can think of a simply solution.
I have made a function within a component which I use for connecting to an API to obtain data. I also want to to cache data in order to make it more efficient.
The problem is that exactly when the cache expires depends on a field in the $result array (from the API) called $result['expiry']. So, I've done this:
class ApiComponent extends Component {
public function getRecords($data) {
$cacheName = $data['Model'] . "_" . $data['action'];
// IMPOSSIBLE because we don't have $result:
// Cache::set(array('duration' => '+' . $result['expiry'] . ' days'));
$result = Cache::read($cacheName);
if (!$result) {
$result = $this->connect('getRecords',$data);
Cache::set(array('duration' => '+' . $result['expiry'] . ' days'));
Cache::write($cacheName, $result);
}
return $result;
}
public function connect($endpoint,$data) {
// Code for connecting to the API
}
}
The problem is that, for some reason, Cake needs me to specify the expiry for a write AND a read, despite the expiry timestamp being present in the cache file itself.
I've commended the necessary line as IMPOSSIBLE in my example above. According to the book, this is how it's supposed to work. Unfortunately it doesn't make much sense to me.
I understand having to set the expiry before a write but why do I need to set the expiry before a read when the cache file itself has a UNIX timestamp specifying exactly when the cache expires, based on what I did when I set the write.
Please note that accessing the API is only costing me 100ms so there wouldn't be any point saving the expiry to the database and accessing that every time unless I also cached the local database output but that seems like a truly bizzare thing to have to do: caching a cache expiry!

Theoretically you don't necessarily need the duration when reading from the cache. The reason for having it available on read, is to be able to invalidate the cache in case the duration changed ever since the data was cached.
https://github.com/cakephp/cakephp/commit/caa7bb621871706baf664b5e2e4f562353f3671f
Unfortunately there is no clean way of overcoming this auto-magic as a duration option value will be used no matter what, when not defining a custom one, the one from the default configuration will be used, and if that one is lower than the one used on write, then your cache is going to be invalidated prematurely.
A (kinda ugly) workaround would be to set an extraordinary high duration dummy value for the read operation, you're losing the auto-magic, but your cache is not going to be invalidated prematurely:
Cache::set(array('duration' => '+10 years'));
$result = Cache::read($cacheName);

Related

In isEmpty condition cakephp cacheing not working

I'm trying cacheing query, condition is after first request and if query object is not empty I want to cache. I have tried like below
public function getCategories()
{
$categoriesTable = TableRegistry::getTableLocator()->get( 'Categories' );
if(\Cake\Cache\Cache::read('categories', 'redis_cache') == null)
{
$categories = $categoriesTable->find();
if(!$categories->isEmpty())
{
print("hello");
$categories->cache('categories','redis_cache');
debug(\Cake\Cache\Cache::read('categories', 'redis_cache'));
}
$this->set('categories',$categories);
}else{
$this->set('categories',\Cake\Cache\Cache::read('categories', 'redis_cache'));
}
$this->viewBuilder()->setOption('serialize', ['categories']);
}
Output I got in postman
1st time hit
hello
null
2nd time hit
hello
null
But after comment if(!$categories->isEmpty()) condition
1st time hit
{
"categories": [
{
"id": 2,
...
}
Also if I write the condition like below it's also working fine.
if(!$categoriesTable->find('all')->isEmpty()). // it's working
What's the wrong I'm doing here ?
Queries are lazy evaluated, meaning they are not being executed until the data is actually being requested, however isEmpty() is one of the methods that will cause the query to be executed, it needs the results to tell whether there are any.
Your code sets the query cacher config afterwards, which will have no effect, as further evaluation/iteration of the query object will not execute the query again (which is when it would normally cache its results), but access the runtime buffered results instead, hence no data is being cached.
If you need more control over how caching is being handled, then you maybe want to avoid the built-in query caching, and do it completely manually, eg read and also write the cache as neccessary.
As far as your use case is concerned, if you have a script running that deletes and regenerates the data, you might want to consider using it to also clear the cache afterwards (bin/cake cache clear redis_cache). Then, if you implement the "try again until there's data" logic in the frontend, your backend code can be simplified to just read and cache whatever data there is, and if there is none yet, your frontend will let the user know that the data isn't ready yet and poll again. While websockets or server sent events might be better, long polling is certainly still preferable over having a script run minutes on end.

How can I access the remaining time on the google countdown timer from a watchface?

I'm buliding a watch face and want to display the remaining time on the timer.
I've been looking for ways to access (presumably) "com.google.android.deskclock" for the timer data, but have not found anything on the net.
Thank you for your help.
There's no official API for this, but because the system Clock app exposes this value via a complication, you can access it that way.
Start by specifying the timer complication as a default:
setDefaultComplicationProvider(myComplicationId,
new ComponentName("com.google.android.deskclock",
"com.google.android.deskclock.complications.TimerProviderService"),
ComplicationData.TYPE_SHORT_TEXT);
If your watch face already supports complications, you could just feed this provider into that logic. Otherwise - if you want to do something else with the timer value - you'll need to extract it from the ComplicationData. So, in your WatchFaceService.Engine subclass:
#Override
public void onComplicationDataUpdate(int complicationId, ComplicationData complicationData) {
super.onComplicationDataUpdate(watchFaceComplicationId, data);
if (complicationId == myComplicationId) {
// This is the timer complication
CharSequence timerValue = complicationData.getShortText()
.getText(getBaseContext(), System.currentTimeMillis());
}
}
This gives you the current timerValue to do whatever you'd like with.
A couple of caveats to this approach:
Because this isn't one of the published system providers, there's always a chance that it won't work on some watch somewhere - or that an update may break it.
The user will need to have granted complication permission to your app. If your face is already showing other complications, this may be a non-issue, but otherwise it's your call if this is an acceptable UX.

Meteor one time or "static" publish without collection tracking

Suppose that one needs to send the same collection of 10,000 documents down to every client for a Meteor app.
At a high level, I'm aware that the server does some bookkeeping for every client subscription - namely, it tracks the state of the subscription so that it can send the appropriate changes for the client. However, this is horribly inefficient if each client has the same large data set where each document has many fields.
It seems that there used to be a way to send a "static" publish down the wire, where the initial query was published and never changed again. This seems like a much more efficient way to do this.
Is there a correct way to do this in the current version of Meteor (0.6.5.1)?
EDIT: As a clarification, this question isn't about client-side reactivity. It's about reducing the overhead of server-side tracking of client collections.
A related question: Is there a way to tell meteor a collection is static (will never change)?
Update: It turns out that doing this in Meteor 0.7 or earlier will incur some serious performance issues. See https://stackoverflow.com/a/21835534/586086 for how we got around this.
http://docs.meteor.com/#find:
Statics.find({}, {reactive: false} )
Edited to reflect comment:
Do you have some information that the reactive: false param is only client side? You may be right, it's a reasonable, maybe likely interpretation. I don't have time to check, but I thought this may also be a server side directive, saying not to poll the mongo result set. Willing to learn...
You say
However, this is horribly inefficient if each client has the same large data set where each document has many fields.
Now we are possibly discussing the efficiency of the server code, and its polling of the mongo source for updates that happen outside of from the server. Please make that another question, which is far above my ability to answer! I doubt that is happening once per connected client, more likely is a sync between app server info and mongo server.
The client requests you issue, including sorting, should all be labelled non-reactive. That is separate from whether you can issue them with sorting instructions, or whether they can be retriggered through other reactivity, but which need not include a trip to the server. Once each document reaches the client side, it is cached. You can still do whatever minimongo does, no loss in ability. There is no client asking server if there are updates, you don't need to shut that off. The server pushes only when needed.
I think using the manual publish ( this.added ) still works to get rid of overhead created by the server observing data for changes. The observers either need to be added manually or are created by returning a Collection.curser.
If the data set is big you might also be concerned about the overhead of a merge box holding a copy of the data for each client. To get rid of that you could copy the collection locally and stop the subscription.
var staticData = new Meteor.Collection( "staticData" );
if (Meteor.isServer ){
var dataToPublish = staticData.find().fetch(); // query mongo when server starts
Meteor.publish( "publishOnce" , function () {
var self = this;
dataToPublish.forEach(function (doc) {
self.added("staticData", doc._id, doc); //sends data to client and will not continue to observe collection
});
});
}
if ( Meteor.isClient ){
var subHandle = Meteor.subscribe( "publishOnce" ); // fills client 'staticData' collection but also leave merge box copy of data on server
var staticDataLocal = new Meteor.Collection( null ); // to store data after subscription stops
Deps.autorun( function(){
if ( subHandle.ready() ){
staticData.find( {} ).forEach( function ( doc ){
staticDataLocal.insert( doc ); // move all data to local copy
});
subHandle.stop(); // removes 'publishOnce' data from merge box on server but leaves 'staticData' collection empty on client
}
});
}
update: I added comments to the code to make my approach more clear. The meteor docs for stop() on the subscribe handle say "This will typically result in the server directing the client to remove the subscription's data from the client's cache" so maybe there is a way to stop the subscription ( remove from merge box ) that leaves the data on the client. That would be ideal and avoid the copying overhead on the client.
Anyway the original approach with set and flush would also have left the data in merge box so maybe that is alright.
As you've already pointed out yourself in googlegroups, you should use a Meteor Method for sending static data to the client.
And there is this neat package for working with Methods without async headaches.
Also, you could script out the data to a js file, as either an array or an object, minimize it, then link to it as a distinct resource. See
http://developer.yahoo.com/performance/rules.html for Add an Expires or a Cache-Control Header. You probably don't want meteor to bundle it for you.
This would be the least traffic, and could make subsequent loads of your site much swifter.
as a response to a Meteor call, return an array of documents (use fetch()) No reactivity or logging. On client, create a dep when you do a query, or retrieve the key from the session, and it is reactive on the client.
Mini mongo just does js array/object manipulation with an syntax interpreting dsl between you and your data.
The new fast-render package makes one time publish to a client collection possible.
var staticData = new Meteor.Collection ('staticData');
if ( Meteor.isServer ){
FastRender.onAllRoutes( function(){
this.find( staticData, {} );
});
}

Cache a large amount of data with CakePHP

I have a lot of data that I can potentially work with, it hardly ever changes, and it's not a problem at all to refresh the cache every now and then - but ideally the cache would be refreshed only when new data enters the table.
I have built with the understanding that CakePHP can cache an entire table and work off of that, even with slower caching like the File cache. Meaning I query once for all information, and if I ever say "findById(54)", it just searches that cache for ID 54 and deals it out. I do not want to only cache the content of ID 54. I also want to avoiding changing the code I have already done. I know that it must be possible to just findAll() and cache that once, but I haven't built with that in mind and I'd rather not go through it all if I don't have to.
My question is, how do I get CakePHP to cache the entire table and query off that cache? Is it possible? I'm open to using memcached, but I'm thinking for its beginning that I would only use File caching.
I have a lot of data
How much is "a lot"?
Override the find method in your AppModel
A simple technique to do what you want is to hijack find calls in your app model, and return a cached result if there is one, similar to this:
public function find($type = 'first', $params = array()) {
$key = md5(serialize(func_get_args());
$return = Cache::read($key);
if (false !== $return) {
return $cacheResult;
}
$return = parent::find($type, $params);
Cache::write($key, $return);
return $return;
}
In this way, you populate the cache as you use the db, and have the benefit of being able to disable the cache, or have an empty cache, and still have your site work.

App Engine Memcache Key Prefix Across Versions

Greetings!
I've got a Google App Engine Setup where memcached keys are prefixed with os.environ['CURRENT_VERSION_ID'] in order to produce a new cache on deploy, without having to flush the cache manually.
This was working just fine until it became necessary for development to run two versions of the application at the same time. This, of course, is yielding inconsistencies in caching.
I'm looking for suggestions as to how to prefix the keys now. Essentially, there needs to be a variable that changes across versions when any version is deployed. (Well, this isn't quite ideal, as the cache gets totally blown out.)
I was thinking of the following possibilities:
Make a RuntimeEnvironment entity that stores the latest cache prefix. Drawbacks: even if cached, slows down every request. Cannot be cached in memory, only in memcached, as deployment of other version may change it.
Use a per-entity version number. This yields very nice granularity in that the cache can stay warm for non-modified entities. The downside is we'd need to push to all versions when models are changed, which I want to avoid in order to test model changes out before deploying to production.
Forget key prefix. Global namespace for keys. Write a script to flush the cache on every deploy. This actually seems just as good as, if not better than, the first idea: the cache is totally blown in both scenarios, and this one avoids the overhead of the runtime entity.
Any thoughts, different ideas greatly appreciated!
The os.environ['CURRENT_VERSION_ID'] value will be different to your two versions, so you will have separate caches for each one (the live one, and the dev/testing one).
So, I assume your problem is that when you "deploy" a version, you do not want the cache from development/testing to be used? (otherwise, like Nick and systempuntoout, I'm confused).
One way of achieving this would be to use the domain/host header in the cache - since this is different for your dev/live versions. You can extract the host by doing something like this:
scheme, netloc, path, query, fragment = urlparse.urlsplit(self.request.url)
# Discard any port number from the hostname
domain = netloc.split(':', 1)[0]
This won't give particularly nice keys, but it'll probably do what you want (assuming I understood correctly).
There was a bit of confusion with how i worded the question.
I ended up going for a per-class hash of attributes. Take this class for example:
class CachedModel(db.Model):
#classmethod
def cacheVersion(cls):
if not hasattr(cls, '__cacheVersion'):
props = cls.properties()
prop_keys = sorted(props.keys())
fn = lambda p: '%s:%s' % (p, str(props[p].model_class))
string = ','.join(map(fn, prop_keys))
cls.__cacheVersion = hashlib.md5(string).hexdigest()[0:10]
return cls.__cacheVersion
#classmethod
def cacheKey(cls, key):
return '%s-%s' % (cls.cacheVersion(), str(key))
That way, when entities are saved to memcached using their cacheKey(...), they will share the cache only if the actual class is the same.
This also has the added benefit that pushing an update that does not modify a model, leaves all cache entries for that model intact. In other words, pushing an update no longer acts as flushing the cache.
This has the disadvantage of hashing the class once per instance of the webapp.
UPDATE on 2011-3-9: I changed to a more involved but more accurate way of getting the version. Turns out using __dict__ yielded incorrect results as its str representation includes pointer addresses. This new approach just considers the datastore properties.
UPDATE on 2011-3-14: So python's hash(...) is apparently not guaranteed to be equal between runs of the interpreter. Was getting weird cases where a different app engine instance was seeing different hashes. using md5 (which is faster than sha1 faster than sha256) for now. no real need for it to be crypto-secure. just need an ok hashfn. Will probably switch to use something faster, but for now i rather be bugfree. Also ensured keys were getting sorted, not the property objects.

Resources