Is there a way to force Solr to read the indexes every X minutes? - solr

I am trying to use Solr to read and search trough the indexes provided by an another application. These indexes are copied to a NAS every 15 minutes.
Is there a way to force Solr to re-read the indexes every 15 minutes ? Is there a way to set a searcher to expire or to be reloaded using maybe a CRON expression?
I am aware that I can reload the core... but I'm asking if maybe is there an another way...
Thanks.

If you are able to write some CRON expression it could be done in that way:
Solr have an endpoint for reloading a core, so all you need is to hit this URI every X minutes.
Load a new core from the same configuration as an existing registered
core. While the "new" core is initalizing, the "old" one will continue
to accept requests. Once it has finished, all new request will go to
the "new" core, and the "old" core will be unloaded.
http://localhost:8983/solr/admin/cores?action=RELOAD&core=core0

Yes you can use a CRON expression.
DataImportHandler will allow you to update your Solr index based on your NAS-indexes.
Look for the "delta-import" command "for incremental imports and change detection":
http://<host>:<port>/solr/<collection_name>/dataimport?command=delta-import
Programmatically using a Client API like SolrJ:
CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr/<collection_name>");
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("command", "delta-import");
QueryRequest request = new QueryRequest(params);
request.setPath("/dataimport");
server.request(request);

Related

How to set Data Import Handler and Scheduler using solrJ Client

I am new to solr search, i have completed a simple search.
Now I want to index documents directly from Database and want set scheduler or trigger for updating index when there is any change in DB.
I know that I can do it with DataImportHandler but can't understand its flow.
can you help me that from which steps I should have to start this process?
or can anyone just give me pointers to do this ??
I want to do this all things using SolrJ client.
This task requires many parts to work together. Work through https://wiki.apache.org/solr/DataImportHandler
DataImportHandler is a Solr component, which means that it runs inside the Solr instance. All you have to do is configure Solr and than run the DHI through the Dataimport Screen.
On the other hand SolrJ is an API that makes it easy for Java applications to talk to Solr. So you can write your own applications that create, modify, search and delete documents to Solr.
try to do simple edit and delete function on button click event and
send the id with that url in servlet and do your jdbc opertaion
after that successfully commited, call your data import command from solrj and redirect it to your index page
thats it.

Way to setup Google Cloud App that creates cache and possible request time out?

I have RSS feed 3rd party website that creates cache which is stored in cache folders. Every request takes 25-40 sec first time, after that it serves from cache for 9-10 mins.
Problem 1: GAE doesn't provide writing to file system. So how should i provide caching?
Problem 2: Request takes 25-40 sec for every time after caching times out. How should i approach this??
Is there any way to sort this out or should i need to use Google Compute Engine which provides both facility??
I read articles about this but no direct answer to my question. Stuck here 2 days before posting here. Thank you.
You can cache on Appengine by simply sending back the necessary cache headers in the HTTP response.
Query the next results outside the user's request using a task queue and prepare the new data to be served in the datastore and/or memcache. Subsequent user requests can then quickly serve the latest data that's available without the setup delay.
Instead of using the file system to store text, use a TextProperty in the NDB datastore. Give it a unique key so you can request it in the future.
This solves your caching problem, too. Requesting an entity by its key will use the built-in cache, or if it's not in the cache it will fetch it from the datastore.
Then add a cron job to update the datastore every ten minutes or so.
class RssFeed(ndb.Model):
KEY = 'RSS'
cache = TextProperty()
last_updated = DateTimeProperty(auto_now=True)
#classmethod
def update(cls):
# insert your code to fetch from the RSS feed
# cache = ...
rss_feed = cls.get_or_insert(key=self.KEY)
rss_feed.cache = cache
rss_feed.save()
#classmethod
def fetch(cls):
rss_feed = cls.get_by_id(self.KEY)
return rss_feed.cache
Call RssFeed.update() to update the cache and RssFeed.fetch() whenever you want the cached data.

Solr-Collection or Core not reloading schema.xml

I am using Solar 4.6 and changed something inside schema.xml. In order to update schema.xml inside my core I used zkcli. Which works fine and I am able to see the modified schema.xml inside the Solr Admin GUI under cloud\tree\config\foobar\schema.xml.
But after calling
http://localhost:8983/solr/admin/collections?action=RELOAD&name=foobar and
http://localhost:8983/solr/admin/cores?action=RELOAD&name=foobar,
the old schema.xml was still in the core named foobar.
Your 2nd HTTP request to the Core API is wrong. Change name to core:
http://localhost:8983/solr/admin/cores?action=RELOAD&name=foobar should be
http://localhost:8983/solr/admin/cores?action=RELOAD&core=foobar.
http://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.6.pdf (page 277)
RELOAD
The RELOAD action loads a new core from the configuration of an existing, registered Solr core. While the new core is initializing, the existing one
will continue to handle requests. When the new Solr core is ready, it takes over and the old core is unloaded.
This is useful when you've made changes to a Solr core's configuration on disk, such as adding new field definitions. Calling the RELOAD action
lets you apply the new configuration without having to restart the Web container. However the Core Container does not persist the SolrCloud
solr.xml parameters, such as solr/#zkHost and solr/cores/#hostPort, which are ignored.
http://localhost:8983/solr/admin/cores?action=RELOAD&core=core0
The RELOAD action accepts a single parameter, core, which is the name of the core to be reloaded.
see also https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-RELOAD
You have to reload your cores after giving it a new schema.
Replace name with core in your query as:
/solr/admin/cores?action=RELOAD&**core**=yourcorename
For example
http://localhost:8983/solr/admin/cores?action=RELOAD&core=foobar

how to implement solr to index mysql database in java?

i want to use solr to index MySql database and so that I can perform a faster search of data on my website.Can anyone help me with the code. I don't have any idea how to implement solr in my code.
Your question is too broad. However for a head start you could have a look at DataImport in Solr.
you many want to check for Solr Data Import Handler module which will help you index data from MySQL into Solr without writing any java code.
If you have downloaded Solr, You can check out the example solr-4.3.0/example/example-DIH (Refer to the readme.txt) which will give you an idea of how the DIH is configured and the indexing can be done.
CommonsHttpSolrServer commonsHttpSolrServer = new CommonsHttpSolrServer("http://localhost:8983/solr");
QueryRequest request = new QueryRequest(params);
request.setPath("/dataimport");
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("command", "full-import");
commonsHttpSolrServer.request(request);
NOTE - The request sent is asynchronous, so you would receive an immediate response and would need to check the status to know if it was complete.

Search using SOLR is not up to date

I am writing an application in which I present search capabilities based on SOLR 4.
I am facing a strange behaviour: in case of massive indexing, search request doesnt always "sees" new indexed data. It seems like the index reader is not getting refreshed frequently, and only after I manually refresh the core from the Solr Core Admin window - the expected results will return...
I am indexing my data using JsonUpdateRequestHandler.
Is it a matter of configuration? do I need to configure Solr to reopen its index reader more frequently somehow?
Changes to the index are not available until they are commited.
For SolrJ, do
HttpSolrServer server = new HttpSolrServer(host);
server.commit();
For XML either send in <commit/> or add ?commit=true to the URL, e.g. http://localhost:8983/solr/update?commit=true

Resources