How to bypass JPA level2 cache for query and find - google-app-engine

I'm using JPA2 with level2 cache enabled (jcache/memcache) on GAE. I have to run some update transactions and would like them to rely on datastore data and not the cached one. I tried to set the javax.persistence.cache.retrieveMode property to "BYPASS" when using the JPA find method but it doesn't seem to work at all. So I wonder if cache bypass is possible with datanucleus JPA2 ?
Code sample :
if (bypassCache) {
return find(className, Collections.Collections.<String,Object>singletonMap("javax.persistence.cache.retrieveMode",CacheRetrieveMode.BYPASS));
}
else return find(className);
I'm using datanucleus 3.1.3 and appengine 1.7.7.1.
Thanks !
After reading the source code of datanucleus I understand that the JPA cache bypass is not implemented for now for the find methods. Could anyone confirm ?
In fact it seems that EntityManager.find() methods are always L2 cached whatever the properties you set. I did a JPQL query and the query result isn't L2 cached. (datanucleus.query.results.cached is false by default). So my understanding is I should use Queries to get fine control of L2 cache.

Related

How to implement solr and solarium in cakePHP

I have to integrate Solr search in one of my CakePHP project. I have installed Solr and Solarium using composer but could not find how to start searching using Solr. Is there any example code in CakePHP?
First thing you need to figure out is how to expose the Solarium API in your CakePHP application. Typically this means saving some third-party PHP files into the Vendor directory of your application (take peek here for more information).
Once you've done this, you have two options:
Interact with the Solarium API directly in your controller's actions.
Implement a Solarium datasource so you can use CakePHP's model constructs.
Option 1
This option is less consistent with how the developers of CakePHP would like you to do MVC and you will have to generate a fair bit of code each time you want to put something in Solr or query it (e.g. connect to the Solr database). If you have minimal interaction with your Solr database, then I would recommend going down this route. Perhaps you could wrap up your access in separate helper class or function so instead of this:
public function void myControllerAction() {
// create a client instance
$client = new Solarium\Client($config);
// get a select query instance
$query = $client->createQuery($client::QUERY_SELECT);
// this executes the query and returns the result
$resultset = $client->execute($query);
// expose the result set to your view
$this->set('records', $resultset);
}
you could have this:
public function void myControllerAction() {
$resultset = solarium_get_records();
// expose the result set to your view
$this->set('records', $resultset);
}
Option 2
This option is a bit more involved and requires you to write a Solarium datasource just like the developers have written for MySql and Postgres. This does require you to thoroughly understand the inner workings of CakePHP's model engine but by taking a look at how the other datasources work, it shouldn't be rocket science. Rest assured that if you did this and made your code open-source, other developers will love to use it in their own CakePHP applications!
The benefit of this approach is that you will successfully abstract your application from the specific database implementation. So if you decided you didn't fancy using Solr and preferred a different search engine, you could migrate your data, write a new datasource (if one didn't exist already) and you're all set.
This probably doesn't exactly answer your question but instead steer you in the right direction and highlight some aspects you should consider.
I have integrated Solr with Option 1. However option 2 is doable but due to some time restriction I have to choose option 1. We can directly include Solarium vendor and include its class in our controller wherever required and use solr's add/get queries.
There are basically 3 major steps: 1: Install Solr. 2: Install Solarium using composer 3: User your scripts within controller or component files to get results.
You can get complete reference and example codebase from here:
http://findnerd.com/list/view/Solr-integration-in-CakePHP-with-solarium-client/1946/
Thanks.
I've integrated Solr using Sam's Option 2 (as DataSource)
https://github.com/Highstrike/cakephp-solr-datasource
You can also find there instructions on how to use it with examples

Does google app engine java datastore cache JPA query results?

I am using DN3 and GAE 1.7.4.
I use JPA2 which according to the documentations by default has Level2 cache enabled.
Here is my question:
If I run a query that returns some objects, would these objects be put in the cache automatically by their ID?
If I run em.find() with an id of an object which has already been loaded with another query createQuery().getResultList() would it be available in the cache?
Do I need to run my em.find() or query in a transaction in order for the cache to kick in?
I need some clarification on how this cache works and how I could do my queries/finds/persists in order to make the best use of the cache.
Thanks
From Google App Engine: Using JPA with App Engine
Level2 Caching is enabled by default. To get the previous default
behavior, set the persistence property datanucleus.cache.level2.type
to none. (Alternatively include the datanucleus-cache plugin in the
classpath, and set the persistence property
datanucleus.cache.level2.type to javax.cache to use Memcache for L2
caching.
As for your doubts, this depends on your query as well as DataNucleus and GAE Datastore adapter implementation specifics. As Carol McDonald suggested I believe that the best path to find the answers for your questions is with JPA2 Cache interface... More specifically the contains method.
Run your query, get access to the Cache interface through the EntityManagerFactory and see if the Level 2 cache contains the desired entity.
Enabling DataNucleus logs will also give you good hints about what is happening behind the scenes.
After debugging in development local GAE mode I figured level 2 cache works. No need for transaction begin/commit. The result of my simple query on primary keys as well as em.find() would be put in the cache by their primary keys.
However the default cache timeout in local development server is like a few seconds, I had to add this:
<property name="datanucleus.cache.level2.timeout" value="3600000" />
to persistence.xml.

solr healthcheck for >0 documents

The default configuration for solr of /admin/ping provided for load balancer health check integrates well with the Amazon ELB load balancer health checks.
However since we're using master-slave replication when we provision a new node, Solr starts up, and replication happens, but in the meantime /admin/ping return success before the index has replicated across from master and there are documents.
We'd like nodes to only be brought live once they have done the first replication and have documents. I don't see any way of doing this with /admin/ping PingRequestHandler - it always return success if the search succeeds, even with zero results.
Nor is there anyway of matching/not matching expected text in the response with the ELB health check configuration.
How can we achieve this?
To expand on the nature of the problem here, the PingRequestHandler will always return a success unless....
Its query results in an exception being thrown.
It is configured to use a healthcheck file, and that file is not found.
Thus my suggestion is that you configure the PingRequestHandler handler to use a healthcheck file. You can then use a cron job on your Solr system whose job is to check for the existence of documents and create (or remove) the healthcheck file accordingly. If the healthcheck file is not present, the PingRequestHandler will throw a HTTP 503 which should be sufficient for ELB.
The rough algorithm that I'd use...
Every minute, query http://localhost:8983/solr/select?q=*:*
If numDocs > 0 then touch /path/to/solr-enabled
Else rm /path/to/solr-enabled (optional, depending on your strictness)
The healthcheck file can be configured in the <admin> block, and you can use an absolute path, or a filename relative to the directory from which you have started Solr.
<admin>
<defaultQuery>solr</defaultQuery>
<pingQuery>q=*:*</pingQuery>
<healthcheck type="file">/path/to/solr-enabled</healthcheck>
</admin>
Let me know how that works out! I'm tempted to implement something similar for read slaves at Websolr.
I ran into an interesting solution here: https://jobs.zalando.com/tech/blog/zookeeper-less-solr-architecture-aws/?gh_src=4n3gxh1
It's basically a servlet that you could add to the Solr webapp and then check all of the cores to make sure they have documents.
I'm toying with a more sophisticated solution but haven't tested it/made much progress yet: https://gist.github.com/er1c/e261939629d2a279a6d74231ce2969cf
What I like about this approach (in theory) is the ability to check the replication status/success for multiple cores. If anyone finds an actual implementation of this approach please let me know!

How to prevent so many DESCRIBE queries in CakePHP

Unfortunately, the CakePHP documentation is horrible. I can't seem to figure out if it's possible to cache the results of these DESCRIBE queries in my live environment. All I can find on caching for CakePHP is caching for data within controllers...
Set your debug to 0 in your core.php file. With debugging > 0, Cake automatically checks for schema changes.

Running Solr in read-only mode

I think I'm missing something obvious here. I have to imagine a lot of people open up their Solr servers to other developers and don't want them to be able to modify the index.
Is there something in solrconfig.xml that can be set to effectively make the index read-only?
Update for clarification:
My goal is to use Solr with an existing Lucene index managed by another application. This works just fine, but I want to be sure Solr never tries to write to this index.
Exposing a Solr instance to the public internet is a bad idea. Even though you can strip some components to make it read-only, it just wasn't designed with security in mind, it's meant to be used as an internal service, just like you wouldn't expose a RDBMS.
From the Solr Security wiki page:
First and foremost, Solr does not
concern itself with security either at
the document level or the
communication level. It is strongly
recommended that the application
server containing Solr be firewalled
such the only clients with access to
Solr are your own. A default/example
installation of Solr allows any client
with access to it to add, update, and
delete documents (and of course
search/read too), including access to
the Solr configuration and schema
files and the administrative user
interface.
Even ajax-solr, a Solr client for javascript meant to run in a browser, recommends talking to Solr through a proxy.
Take for example guardian.co.uk: it's well-known that they use Solr for searching, but they built an API to let others access their content. This way they can define and control exactly what and how they want people to search for things.
Otherwise, any script kiddie can write a trivial loop to DoS your Solr instance and therefore bring down your site.
You can probably just remove the line that defines your solr.XmlUpdateRequestHandler in solrconfig.xml.
Replication is a nice way to setup read-only while being able to do indexation. Just setup a master with restricted access and a slave that is read-only (by removing your XmlUpdateRequestHandler from the config). The slave will be replicated from the master but won't accept any indexation directly.
UPDATE
I just read that in Solr 1.4, you can disable component. I just tried it on the /update requestHandler and I was not able to index anymore.

Resources