Solr change schema - solr

I am running Solr 4.1, and I need to change my schema. I modified the schema file, but it doesn't seem to pick up the changed schema. I even rebooted my server.
Also, how do I see my schem? I tried to do curl http://localhost:1080/Mark/names/schema?wt=json and I get
The origin server did not find a current representation
for the target resource or is not willing to disclose that one exists.
I can curl the same endpoint and get a document just fine.
How do I get Solr to pick up the new schema?

I had an error in my code that was indexing the data. I was setting the values for the new fields, but I did not realize that there was another method buried deep in the bowels :) that stripped it off.
Turns out, Solr picked up the changes, I just didn't send the data for it.

Related

Posting large directory of files to SOLR using post tool, how to commit after every file

I am using the java post tool for solr to upload and index a directory of documents. There are several thousand documents. Solr only does a commit at the very end of the process and sometimes things stop before it completes so I lose all the work.
Has anyone a technique to fetch the name of each doc and call post on that so you get the commit for each document? Rather than the large commit of all the docs at the end?
From the help page for the post tool:
Other options:
..
-params "<key>=<value>[&<key>=<value>...]" (values must be URL-encoded; these pass through to Solr update request)
This should allow you to use -params "commitWithin=1000" to make sure each document shows up within one second of being added to the index.
Committing after each document is an overkill for the performance, in any case it's quite strange that you had to resubmit anything from start if something goes wrong. I suggest to seriously to change the indexing strategy you're using instead of investigating in a different way to commit.
Given that, if you not have any other way that change the commit configuration, I suggest to configure autocommit in your Solr collection/index or use the parameter commitWithin, as suggested by #MatsLindh. Just be aware if the tool you're using has the chance to add this parameter.
autoCommit
These settings control how often pending updates will be automatically pushed to the index. An alternative to autoCommit is
to use commitWithin, which can be defined when making the update
request to Solr (i.e., when pushing documents), or in an update
RequestHandler.

Is there an easy way to delete a complete Vespa document set?

Playing with Yahoo's vespa.ai, I'm now at a point where I have a search definition with which I am happy, but still, have a bunch of garbage test documents stored.
Is there an easy way to delete/purge/drop all of them at once, ala SQL DROP TABLE or DELETE FROM X?
The only place I found at this point where deleting documents is clearly mentioned in the Document JSON format page. As far as I understand it requires deleting documents one by one, which is fine, but gets a bit cumbersome when one is just playing around.
I tried deleting the application via the Deploy API using the default tenant, but the data is still there when issuing search requests.
Did I miss something? or is this by design?
There's no API available to do this, but the vespa-remove-index command line tool could help you out. Ie, to drop everything:
$ vespa-stop-services
$ vespa-remove-index
$ vespa-start-services
You could also play around with using garbage collection for this, but I wouldn't go down this path unless you are unable to use vespa-remove-index.

Easy GUI way to add data to a Solr index

I'm working on the front end of an app that uses Solr for data storage. Currently I have an empty index, but it'd (understandably) be a lot easier for me if some dummy data was returned so I could make sure that it's output correctly on the front end.
If I was working with and RDBMS (let's say postgres) I'd open up a GUI (e.g. pgadmin) and type data manually into a few rows to achieve this goal. I have access to the Solr web interface, but I can't see any obvious call to action saying INSERT YOUR DATA HERE. The closest thing I can find to an answer on the web is this SO thread, but it's still not quite the droids easy GUI-based solution I'm looking for.
So, my question is: Is the a way to quickly and easily insert some data equivalent to the RDBMS method mentioned above?
Make sure you have defined a schema in schema.xml.
SOLR does indeed have a (limited) html GUI, which on a local installation is probably found at localhost:8983/solr (default). If you can get to the base admin page, then on the left there is a small combobox where you can select a core/collection. If you click on THAT, then you get a list of options that emerges, and you can pick 'documents' to get a similar GUI to what I think you expect from postgres/RDBMS/whatnot.
http://localhost:8983/solr/#/collection1/documents is the URL on a default SOLR installation that I have. This should work as long as you don't have default cores. (Replace collection1 with your collection name and localhost:8983 with wherever your solr is hosted/the port).

SOLR 3.6.0, After a full re-index of a bunch of entities, some of my items are not making it into the SOLR index, but no logs are being generated

Use a StreamingUpdateSolrServer, I used the following algorithm to re-index my huge dataset into SOLR.
Initialize StreamingUpdateSolrServer server = new StreamingUpdateSolrServer(solrServerUrl, numDocsToAddInBatch, numOfThreads);
For each Item…
-->Create document
-->Server.add(document)
When all finished,
server.commit();
server.optimize();
The problem:
Some of my items are not making it into the SOLR index, but no logs are being generated to tell me what happened.
I was able to find most of the documents, but some were missing. No errors in any logs – and I have substantial try/catch blocks with logs around all SOLRJ exceptions on the clients site.
Verify logging is not being hidden for the SOLR WAR
You will definitely want to verify that the SOLR server log settings are not hiding the fact that documents are failing to be added to the index.
Because SOLR uses the SLF4J API, your SOLR server could be over-riding the log settings allowing you to see an error message when the document failed to be indexed.
If you have a custom {solr-war}/WEB-INF/classes/logging.properties, you will need to make sure that the settings are not such that it is hiding the error messages.
By default, errors in adding an item should be shown automatically. So if you did not change your SOLR log settings at any point... you should be seeing any errors during indexing in your server log file.
Troubleshoot why Documents are failing to be indexed
In order to investigate this, it is helpful to follow verification step any time after the indexing is complete:
Initialize new log log_fromsolr
Initialize new log log_notfound
For each Item…
-->Search SOLR for the item. If SOLR has the object, log each item’s fields into log_fromsolr on a single line into log_fromsolr. This should include the unqiueKey for your document if you have one.
-->If document cannot be found in SOLR for this item, write a line to log_notfound with all the fields from the object from the database, also supplying the uniqueKey as the first line.
Once the verification step has completed, the log log_notfound created a list of all Documents that failed to be added into the Index.
You can use the log created by log_fromsolr to compare the document fields for an item that made it into the index and one that did not.
Verify it is not an intermittent issue
Sometimes it might be the case that it is not the same Items failing to be added to the index each time you try to index.
If you find objects in the log_notfound log, you will want to back up the current notfound log and run the indexing process again from scratch. Use a diff tool to see the differences between the first notfound log and the second notfound log.
An intermittent problem is evident when you see large numbers of differences in these files (Note: some differences are to be expected if new objects are being created in the database in between the first and second re-indexing).
If your problem is intermittent, it most certainly points at the application code with respect to your SOLR transactions not being committed correctly.
The same documents consistently come up missing each time it indexes
At this point we have to compare documents that are being found from the SOLR index, versus documents that are not getting into the Lucene index. Usually a field-by-field comparison of the object will start turning of some suspicious values that may be causing issues when adding the document to the index.
Try eliminating all the suspicious fields and then re-indexing the entire thing again. See if the documents are still failing to be indexed. If this worked, you will want to start re-introducing the fields that you removed and see if you can pinpoint the one that is the issue.

Django DatabaseError - What is the best way to debug?

I'm having some trouble with a django database (postgresql backend).There was a model (a profile for users) in the project with some boilerplate stuff in it. This has sat in our project for a while, not being used. I actually got round to needing this, so I adjusted the models and created some initial migrations with South. On my dev box I dropped the entire db and syncdb'ed and migrated. This worked fine.
When I've pushed this out to production, I manually removed the old tables in postgresql and syncdb'ed and migrated. However, in my admin interface a DatabaseError is raised as some function is looking for a field on the old model. I've even dropped the entire postgresql database and syncdb'ed / migrated again and this still happens. The offendinging field is called gender (not one that I created). The migration works, and the database structure reflects my models, but for some reason the admin interface wants to find this (non-existant) gender field. This is the error:
DatabaseError: column user_profiles.gender does not exist LINE 1: ... "user_profiles"."id", "user_profiles"."user_id", "user_prof...
I understand this seems to be quite site specific, but perhaps I could get some pointers on how to debug this?
Thanks
If I understand your problem, your current code should not contain a reference to "gender" any more, since the old code was removed. Try to find a source file which still contains it:
find your-dir -name '*.py'|xargs grep gender
Or there is a pyc file, but the py file was removed. But Python still loads the pyc file....
If this does not help, please post the ascii traceback.

Resources