How to reset solr back to first use? - solr

I'm having a lot of problems running my solr server. When I have problems committing my csv files (its a 500 MB csv) it throws up some error and I am never able to fix it. Which is why I try to clean up entire indexing using
http://10.96.94.98:8983/solr/gettingstarted/update?stream.body=<delete><query>*:*</query></delete>&commit=true
But sometimes it just doesnt delete. In which casese, I use the
bin/solr stop -all
And then try, but again it gives me some errors for updating. Then I dedicided to extract the install tarball deleteing all my revious solr files. And successfully it works!
I was wondering if there is a shorter way to go about it. I'm sure the index files arn't the only that are generated. Is there any revert to fresh installion option?

If you are calling the update command against the right collection and you are doing commit, you should see the content deleted/reset. If that is not happening, I would check that the server/collection you are querying is actually the same one you are executing your delete command against (here gettingstarted). If that does not work, you may have found a bug. But it is unlikely.
If you really want to delete the collection, you can unload it in the Admin UI's Core page and then delete from the disk. To see where the collection is, look at the core's Overview page on the right hand side. You will see Instance variable with path to your core's directory. It could be for example: .../solr-6.1.0/example/techproducts/solr/techproducts So, deleting that directory after unloading the core will get rid of everything there.

Related

Posting large directory of files to SOLR using post tool, how to commit after every file

I am using the java post tool for solr to upload and index a directory of documents. There are several thousand documents. Solr only does a commit at the very end of the process and sometimes things stop before it completes so I lose all the work.
Has anyone a technique to fetch the name of each doc and call post on that so you get the commit for each document? Rather than the large commit of all the docs at the end?
From the help page for the post tool:
Other options:
..
-params "<key>=<value>[&<key>=<value>...]" (values must be URL-encoded; these pass through to Solr update request)
This should allow you to use -params "commitWithin=1000" to make sure each document shows up within one second of being added to the index.
Committing after each document is an overkill for the performance, in any case it's quite strange that you had to resubmit anything from start if something goes wrong. I suggest to seriously to change the indexing strategy you're using instead of investigating in a different way to commit.
Given that, if you not have any other way that change the commit configuration, I suggest to configure autocommit in your Solr collection/index or use the parameter commitWithin, as suggested by #MatsLindh. Just be aware if the tool you're using has the chance to add this parameter.
autoCommit
These settings control how often pending updates will be automatically pushed to the index. An alternative to autoCommit is
to use commitWithin, which can be defined when making the update
request to Solr (i.e., when pushing documents), or in an update
RequestHandler.

Is there an easy way to delete a complete Vespa document set?

Playing with Yahoo's vespa.ai, I'm now at a point where I have a search definition with which I am happy, but still, have a bunch of garbage test documents stored.
Is there an easy way to delete/purge/drop all of them at once, ala SQL DROP TABLE or DELETE FROM X?
The only place I found at this point where deleting documents is clearly mentioned in the Document JSON format page. As far as I understand it requires deleting documents one by one, which is fine, but gets a bit cumbersome when one is just playing around.
I tried deleting the application via the Deploy API using the default tenant, but the data is still there when issuing search requests.
Did I miss something? or is this by design?
There's no API available to do this, but the vespa-remove-index command line tool could help you out. Ie, to drop everything:
$ vespa-stop-services
$ vespa-remove-index
$ vespa-start-services
You could also play around with using garbage collection for this, but I wouldn't go down this path unless you are unable to use vespa-remove-index.

Solr 6.4: Cannot unload core via API or Admin Panel

The problem is: I tried to replace a core creating a new one with a different name, swapping and then UNLOAD the old one, but it failed.
Now, even trying to clean everything manually (unloading the cores with the AdminPanel or via curl using deleteIndexDir=true&deleteInstanceDir=true and deleting the physical diretories of both cores, nothing works.
If I UNLOAD the cores using the AdminPanel, then I don't see the cores listed anymore. But the STATUS command still returns me this:
$ curl -XGET 'http://localhost:8983/solr/admin/cores?action=STATUS&core=mycore&wt=json'
{"responseHeader":{"status":0,"QTime":0},"initFailures":{},"status":{"mycore":{"name":"mycore","instanceDir":"/var/solr/data/mycore","dataDir":"data/","config":"solrconfig.xml","schema":"schema.xml","isLoaded":"false"}}}
But, if I try to UNLOAD the core via curl:
$ curl -XGET 'http://localhost:8983/solr/admin/cores?action=UNLOAD&deleteIndexDir=true&deleteInstanceDir=true&core=mycore&wt=json'
{"responseHeader":{"status":0,"QTime":0}}
and there is no effect. I still see the core listed in the AdminPanel, the STATUS returns exactly the same and of course if I want to access the cores errors start poping up telling me that solrconfig.xml doesn't exist. Of course, nothing exists.
I know if I restart Solr everything will be fine. But I cannot restart Solr in production whenever it gets dirty alone (and it does, very often).
Some time ago I made a comment here but I didn't get any useful reply.
Now, the real problem is that in production there are other cores working and to restart Solr it takes about half an hour, which is not ok at all.
So, the question is how to clean unloaded cores properly WITHOUT restarting Solr. Please before saying "no, it's not possible" try to understand the business requirement. It MUST be possible somehow. If you know the reason why it's not possible, let's start thinking together how could it be possible.
UPDATE
I'm adding here some errors I've found looking at the logs, I hope it helps:
Solr init error
Solr create error
Solr duplicate requestid error (my script tried twice using the same id)
Solr closing index writer error
Solr error opening new searcher
I've just noticed that the error opening searcher and the one creating the core are related, both have Caused by: java.nio.file.FileAlreadyExistsException: /var/solr/data/mycore/data/index/write.lock

solr/browse gives page not found error.

How to make browse page load ? I have added handler as given in the page
https://wiki.apache.org/solr/VelocityResponseWriter
Still not working. Can any one brief me on this. Thanks in advance.
Couple of things to check:
Have you restarted Solr?
Is the core you are trying to 'browse' a default core? If not, you need to include the core name in the URL. E.g. /solr/collection1/browse
Are your library statements in solrconfig.xml pointing at the right velocity jar? Use absolute path unless you are very sure that you know what your base directory is for the relative paths
Are you getting any errors in the server logs?
If all fails, start comparing what you have with the collection1 example in Solr distribution. it works there, so you can compare nearly line-by-line the relevant entries and even experiment with collection1 to make it more like your failing example.

"No need to update index.yaml" floods console

At the moment as I debug my App Engine server, I'm often starting it up with the instruction to clear the datastore and then firing a couple of KB of data at it in the hope of figuring out why some of the reports I've written aren't generating properly.
However one thing that's getting in the way of the development and also raising some slight concern is that the console floods with the following output:
DEBUG 2012-07-13 11:44:34,033 datastore_stub_index.py:181] No need to update index.yaml
DEBUG 2012-07-13 11:44:34,221 datastore_stub_index.py:181] No need to update index.yaml
DEBUG 2012-07-13 11:44:34,406 datastore_stub_index.py:181] No need to update index.yaml
DEBUG 2012-07-13 11:44:34,601 datastore_stub_index.py:181] No need to update index.yaml
I've got two questions: should I be concerned about the flood of messages that are indicating that index.yaml does not need to be changed, and if not, is there a way to surpress the warning? If I should be concerned, could someone point me in the right direction?
Thanks,
It's no need for concern and just indicates that the devserver doesn't need to add new items to the index.yaml file. This is explained in more detail here.
Every datastore query made by an application needs a corresponding
index. Indexes for complex queries must be defined in a configuration
file named index.yaml.
The development web server automatically adds items to this file when
the application tries to execute a query that needs an index that does
not have an appropriate entry in the configuration file.
If I'm not mistaken this should only be printed when the --debug flag is passed to the devserver, so maybe it set as an option in the tool you use to invoke the devserver.

Resources