CakePHP performance issues - cached view file size - cakephp

On a CakePHP site (2.3.1) I have noticed the cached view file size is very large (10-60MB) per page.
Typically with sites I would expect the caching to just store pure HTML output but Cake is adding serialised PHP at the top of the files. From a performance perspective this large file size is problematic, using up gigabytes of space (we have 1000s of pages), and is not suited to APC caching (default max file size 1MB).
This is an example block at the top of the cached view file:
<!--cachetime:1363985272--><?php
App::uses('StaticController', 'Controller');
$request = unserialize(base64_decode('<removed>'));
$response = new CakeResponse();
$controller = new StaticController($request, $response);
$controller->plugin = $this->plugin = '';
$controller->helpers = $this->helpers = unserialize(base64_decode('<removed>'));
$controller->layout = $this->layout = 'default';
$controller->theme = $this->theme = '';
$controller->viewVars = unserialize(base64_decode('<removed>'));
Router::setRequestInfo($controller->request);
$this->request = $request;
$this->viewVars = $controller->viewVars;
$this->loadHelpers();
extract($this->viewVars, EXTR_SKIP);
?>
I'd prefer no PHP in there at all, as the HTML below is the static generated output. A massive overhead that accounts for all the file size.
Cache setting in bootstrap.php:
Cache::config('default', array('engine' => 'Apc'));
At present my only option is to improve the file size of the view cache files. Adding something like Varnish is not possible on this server as this point in time.
Any tips to resolve the file size issue would be great.

I was able to vastly reduce the size of cached view files in the end by making these two changes. Hopefully it is useful to anyone who runs into a similar issue.
1 - Model
There was a number of relationships between models, however I was only actually making using of the data on one side of the relation. For example, an article had images, but I also had a relation to get the article belonging to an image, which I never did. Making the relations 'one way' i.e. as required cut down on heavy queries.
2 - Controller
In my paginate array I had 'recursive'=>2, which was causing hundreds of extra queries per page. I was able to cut the queries on a heavy page down from 900 to 20 by changing this option to 'recursive'=>1. There was about 6 many to many relations on the model in question. This recursion level must have slipped in at some stage inadvertently.
I still feel it is odd for CakePHP to serialize PHP in the cached view files. A more optimal approach would be static HTML files without any PHP.

Related

The ML.net prediction has HUGE different compared with Custom Vision

I've trained a model(object detection) using Azure Custom Vision, and export the model as ONNX,
then import the model to my WPF(.net core) project.
I use ML.net to get prediction from my model, And I found the result has HUGE different compared with the prediction I saw on Custom Vision.
I've tried different order of extraction (ABGR, ARGB...etc), but the result is very disappointed, can any one give me some advice as there are not so much document online about Using Custom Vision's ONNX model with WPF to do object detection.
Here's some snippet:
// Model creation and pipeline definition for images needs to run just once, so calling it from the constructor:
var pipeline = mlContext.Transforms
.ResizeImages(
resizing: ImageResizingEstimator.ResizingKind.Fill,
outputColumnName: MLObjectDetectionSettings.InputTensorName,
imageWidth: MLObjectDetectionSettings.ImageWidth,
imageHeight: MLObjectDetectionSettings.ImageHeight,
inputColumnName: nameof(MLObjectDetectionInputData.Image))
.Append(mlContext.Transforms.ExtractPixels(
colorsToExtract: ImagePixelExtractingEstimator.ColorBits.Rgb,
orderOfExtraction: ImagePixelExtractingEstimator.ColorsOrder.ABGR,
outputColumnName: MLObjectDetectionSettings.InputTensorName))
.Append(mlContext.Transforms.ApplyOnnxModel(modelFile: modelPath, outputColumnName: MLObjectDetectionSettings.OutputTensorName, inputColumnName: MLObjectDetectionSettings.InputTensorName));
//Create empty DataView. We just need the schema to call fit()
var emptyData = new List<MLObjectDetectionInputData>();
var dataView = mlContext.Data.LoadFromEnumerable(emptyData);
//Generate a model.
var model = pipeline.Fit(dataView);
Then I use the model to create context.
//Create prediction engine.
var predictionEngine = _mlObjectDetectionContext.Model.CreatePredictionEngine<MLObjectDetectionInputData, MLObjectDetectionPrediction>(_mlObjectDetectionModel);
//Load tag labels.
var labels = File.ReadAllLines(LABELS_OBJECT_DETECTION_FILE_PATH);
//Create input data.
var imageInput = new MLObjectDetectionInputData { Image = this.originalImage };
//Predict.
var prediction = predictionEngine.Predict(imageInput);
Can you check on the image input (imageInput) is resized with the same size as in the model requirements when you prepare the pipeline for both Resize parameters:
imageWidth: MLObjectDetectionSettings.ImageWidth,
imageHeight: MLObjectDetectionSettings.ImageHeight.
Also for the ExtractPixels parameters especially on the ColorBits and ColorsOrder should follow the model requirements.
Hope this help
Arif
Maybe because the aspect ratio is not preserved during the resize.
Try with an image with the size of:
MLObjectDetectionSettings.ImageWidth * MLObjectDetectionSettings.ImageHeight
And you will see much better results.
I think Azure does preliminary processing on the image, maybe Padding (also during training?), or Cropping.
Maybe during the processing it also uses a moving window(the size that the model expects) and then do some aggregation

CakePHP 3.4: how to cache virtual fields

Is there a way to cache virtual fields? I mean automatically, with the entity to which they belong, because I understand that, even if an entity is retrieved from the cache, virtual fields are generated whenever it is necessary.
Obviously I know I can take care of it personally, so (example):
protected function _getFullName()
{
$fullName = Cache::read('full_name_for_' . $this->_properties['id'], 'users');
if (empty($fullName)) {
$fullName = $this->_properties['first_name'] . ' ' . $this->_properties['last_name'];
Cache::write('full_name_for_' . $this->_properties['id'], $fullName, 'users');
}
return $fullName;
}
But I wanted to know if in fact CakePHP can do it directly.
EDIT
Context.
The Post entity has the text property. text can contain images (as html code), even remote. Now I have to store somewhere the url of the first image contained in the text and its size. So I have created the first_image virtual field, that uses a regex. The problem is rather with the image size: I can not do run every time the getimagesize() function, especially if the image is remote, for reasons that you can easily understand. So how to do?
Is there a way to cache virtual fields?
No.
And what you do doesn't make much sense. The caching is for sure causing more overhead than that you gain anything from it in this case.
Use concat() on the DB level to concatenate the name instead.
Also if there would be a real need to cache a virtual property, then I would say there went something clearly wrong in the architecture.
It makes sense to me to want to prevent code in accessors/virtual fields from being executed more than once on the same request, which can easily happen if you use them several places in your script.
You can do a solution like this, but I'm not entirely sure how kosher it is:
private $fullName_cache = false;
protected function _getFullName()
{
if(!$this->fullName_cache){
$fullName = Cache::read('full_name_for_' . $this->_properties['id'], 'users');
if (empty($fullName)) {
$fullName = $this->_properties['first_name'] . ' ' . $this->_properties['last_name'];
Cache::write('full_name_for_' . $this->_properties['id'], $fullName, 'users');
}
$this->fullName_cache = $fullName;
}
return $this->fullName_cache;
}
I think there might be a nicer way to do this. There is mention of this sort of thing in the cookbook:
Code in your accessors is executed each time you reference the field. You can use a local variable to cache it if you are performing a resource-intensive operation in your accessor like this: $myEntityProp = $entity->my_property.
Anyone had luck implementing this?

How to optimize one-to many queries in the datastore

I have a latency problem in my application due to the datastore doing additional queries for referenced entities. I have received good advice on how to handle this for single value properties by the use of the get_value_for_datastore() function. However my application also have one-to many relationships as shown in the code below, and I have not found a way to prefetch these entities. The result is an unacceptable latency when trying to show a table of 200 documents and their associated documentFiles (>6000ms).
(There will probably never be more than 10.000 Documents or DocumentFiles)
Is there a way to solve this?
models.py
class Document(db.Expando):
title = db.StringProperty()
lastEditedBy = db.ReferenceProperty(DocUser, collection_name = 'documentLastEditedBy')
...
class DocUser(db.Model):
user = db.UserProperty()
name = db.StringProperty()
hasWriteAccess= db.BooleanProperty(default = False)
isAdmin = db.BooleanProperty(default = False)
accessGroups = db.ListProperty(db.Key)
...
class DocumentFile(db.Model):
description= db.StringProperty()
blob = blobstore.BlobReferenceProperty()
created = db.DateTimeProperty() # needs to be stored here in relation to upload / download of everything
document = db.ReferenceProperty(Document, collection_name = 'files')
#property
def link(self):
return '%s' % (self.key().id(),self.blob.filename)
...
main.py
docUsers = DocUser.all()
docUsersNameDict = dict([(i.key(), i.name) for i in docUsers])
documents = Document.all()
for d idocuments:
out += '<td>%s</td>' % d.title
docUserKey = Document.lastEditedBy.get_value_for_datastore(d)
out +='<td>%s</td>' % docUsersNameDict.get(docUserKey)
out += '<td>'
# Creates a new query for each document, resulting in unacceptable latency
for file in d.files:
out += file.link + '<br>'
out += '</td>'
Denormalize and store the link in your Document, so that getting the link will be fast.
You will need to be careful that when you update a DocumentFile, you need to update the associated Document. This operates under the assumption that you read the link from the datastore far more often than you update it.
Denormalizing is often the fix for poor performance on App Engine.
Load your files asynchronously. Use get_value_for_datastore on d.files, which should return a collection of keys, which you can then do db.get_async(key) to return a future object. You will not be able to write out your result procedurally as you have done, but it should be trivial to assemble a partial request / dictionary for all documents, with a collection of pending future gets(), and then when you do your iteration to build the results, you can finalize the futures, which will have finished without blocking {~0ms latency}.
Basically, you need two iterations. The first iteration will go through and asynchronously request the files you need, and the second iteration will go through, finalize your gets, and build your response.
https://developers.google.com/appengine/docs/python/datastore/async

how to force drupal function to not use DB cache?

i have a module and i am using node_load(array('nid' => arg(1)));
now the problem is that this function keep getting its data for node_load from DB cache.
how can i force this function to not use DB cache?
Example
my link is http://mydomain.com/node/344983
now:
$node=node_load(array('nid'=>arg(1)),null,true);
echo $node->nid . " -- " arg(1);
output
435632 -- 435632
which is a randomly node id (available on the system)
and everytime i ctrl+F5 my browser i get new nid!!
Thanks for your help
Where are you calling this? For example, are you using it as part of your template.php file, as part of a page, or as an external module?
Unless you have this wrapped in a function with its own namespace, try naming the variable differently than $node -- for example, name it $my_node. Depending on the context, the 'node' name is very likely to be accessed and modified by Drupal core and other modules.
If this is happening inside of a function, try the following and let me know what the output is:
$test_node_1 = node_load(344983); // Any hard-coded $nid that actually exists
echo $test_node_1->nid;
$test_node_2 = node_load(arg(1)); // Consider using hook_menu loaders instead of arg() in the future, but that's another discussion
echo $test_node_2->nid;
$test_node_3 = menu_get_object(); // Another method that is better than arg()
echo $test_node_3->nid;
Edit:
Since you're using hook_block, I think I see your problem -- the block itself is being cached, not the node.
Try setting BLOCK_NO_CACHE or BLOCK_CACHE_PER_PAGE in hook_block, per the documentation at http://api.drupal.org/api/drupal/developer--hooks--core.php/function/hook_block/6
You should also try to avoid arg() whenever possible -- it's a little bit of a security risk, and there are better ways to accomplish just about anything arg() would do in a module environment.
Edit:*
Some sample code that shows what I'm referring to:
function foo_block ($op = 'list', $delta = 0, $edit = array()) {
switch ($op) {
case 'list':
$blocks[0] = array(
'info' => 'I am a block!',
'status' => 1,
'cache' => BLOCK_NO_CACHE // Add this line
);
return $block;
case 'view':
.....
}
}
node_load uses db_query, which uses mysql_query -- so there's no way to easily change the database's cache through that function.
But, node_load does use Drupal's static $nodes cache -- It's possible that this is your problem instead of the database's cache. You can have node_load clear that cache by calling node_load with $reset = TRUE (node_load($nid, NULL, TRUE).
Full documentation is on the node_load manual page at http://api.drupal.org/api/drupal/modules--node--node.module/function/node_load/6
I have had luck passing in the node id to node_load not in an array.
node_load(1);
According to Druapl's api this is acceptable and it looks like if you pass in an array as the first variable it's loaded as an array of conditions to match against in the database query.
The issue is not with arg(), your issue is that you have caching enabled for anonymous users.
You can switch off caching, or you can exclude your module's menu items from the cache with the cache exclude module.
edit: As you've now explained that this is a block, you can use BLOCK_NO_CACHE in hook_block to exclude your block from the block cache.

Minimizing disk accesses when getting attributes of files in a directory

As the title suggests, I'm looking for a way to get attributes of a large number of files in a directory, but without adding the cost of an additional disk access for each file.
For example, if I get the Name attribute of FileInfo objects in a collection, then there is no additional disk access. However if I get the LastWriteTimeUtc, then an additional disk access is made.
My code:
DirectoryInfo di = new DirectoryInfo(myDir);
FileInfo[] allFiles = di.GetFiles("*.*", SearchOption.TopDirectoryOnly);
foreach (FileInfo fInfo in allFiles)
{
name = fInfo.Name //no additional disk access made
lastMod = fInfo.LastWriteTimeUtc //further disk access made!!!
}
Does anyone know of a way I can get this information in one round trip? I would have hoped that DirectoryInfo.GetFiles() does this but no luck.
Thanks in advance.
If you really care about this, you should probably write this in C using FindFirstFile/GetFileTime, etc.
So, this happens by design. The LastWriteTimeUtc is lazy loaded. So nothing to do other write my own component.

Resources