django-haystack and solr setup issue (undefined field text error)

django-haystack and solr setup issue (undefined field text error) - solr

I am new to Solr. I have been following the documentation provided in the http://haystacksearch.org/ site.
My project is on django 1.4.
The steps I followed:
1.Added haystack to installed apps.
2.Modified settings.py with
HAYSTACK_SITECONF = 'directory.search_sites'
HAYSTACK_SEARCH_ENGINE = 'solr'
HAYSTACK_SOLR_URL = 'http://127.0.0.1:8983/solr'
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
'URL': 'http://127.0.0.1:8983/solr'
# ...or for multicore...
# 'URL': 'http://127.0.0.1:8983/solr/mysite',
},
}
3.My search_indexes.py file
from haystack import indexes
from app.models import SellerItem
class SellerItemIndex(indexes.SearchIndex):
text = indexes.CharField(document=True, use_template=True)
title = indexes.CharField(model_attr='title')
sub_title = indexes.CharField(model_attr='sub_title')
description = indexes.CharField(model_attr='description')
def get_model(self):
return SellerItem
def index_queryset(self):
"""Used when the entire index for model is updated."""
return self.get_model().objects.filter(pk__gt=0)
4.Added search_sites.py
import haystack
haystack.autodiscover()
5.added templates/search/indexes/selleritem.txt
{{ object.title }}
{{ object.sub_title }}
{{ object.description }}
6.Added this to urls.py:
(r'^search/', include('haystack.urls')),
7.Created search template
8.Replaced schema.xml in apache-solr-3.6.0/example/solr/conf with the generated xml by using the command:
python manage.py build_solr_schema
I am getting an error like this when I start the solr server:
SEVERE: org.apache.solr.common.SolrException: undefined field text
at org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1330)
at org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer.getAnalyzer(IndexSchema.java:408)
at org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:383)
at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:574)
at org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:206)
at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1429)
at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1317)
at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1245)
at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1234)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206)
at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:79)
at org.apache.solr.search.QParser.getQuery(QParser.java:143)
at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:105)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:165)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:59)
at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1182)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Still the server will be starting
When I do ./manage.py rebuild_index and do a search I get an error log
Problem accessing /solr/select/. Reason:undefined field text
What did I miss? Did anyone had the same issue before?
Thank you

I think your issue stems from the incorrectly named template. You use search/indexes/selleritem.txt, but it should be search/indexes/app/selleritem_text.txt.
As a side note, I see that you’re mixing Haystack 1.X and 2.X settings and methods. By the lack of the indexes.Indexable mixin in your SellerItemIndex search index class, it appears that you must actually be using 1.X. Your life will be simpler if you stick with the docs for the version you are using.
1.2.7 docs
2.0.0-beta docs
Hope that helps,
Ben

Related

Django: CheckConstraint return 500 when conditions are wrong

I tried to use "models.CheckConstraint" to validate birthday field like that:
class CustomUser(AbstractUser):
birthday = models.DateField(null=True)
class Meta:
constraints = [
models.CheckConstraint(
check=Q(birthday__lt=date.today()),
name='check_birthday')
]
When "birthday" < date.today(), it's work fine but when I type some value of "birthday" > date.today(), it shows me an error:
IntegrityError at /api/user/
CHECK constraint failed: check_birthday
Request Method: POST
Django Version: 3.0.7
Python Version: 3.7.3
I followed the docs: https://docs.djangoproject.com/en/3.0/ref/models/options/#constraints
Please tell me why? Thank you.

IntegrityError is an error thrown by database handler, not by view, serializer, form or whatever. Therefore your view does not know what to do with it and passes it as server error.

Solr 8 upgrade and stream.body

I'm upgrading Solr from 6.x to 8.x. In the past, we used to build our request thusly in our PHP script:
$aPostData = array(
'stream.body' => '{"add": {"doc":{...stuff here...}}',
'commit' => 'true',
'collection' => 'mycollection',
'expandMacros' => 'false'
);
$oBody = new \http\Message\Body();
$oBody->addForm($aPostData);
sending it to our Solr server at /solr/mycollection/update/json. That worked just fine in 6.x but now that I've upgraded to 8.x, I'm receiving the following response from Solr
{
"responseHeader":{
"status":400,
"QTime":1
},
"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.common.SolrException"],
"msg":"missing content stream",
"code":400
}
}
Digging around I ran across the following
https://issues.apache.org/jira/browse/SOLR-10748
and
Solr error - Stream Body is disabled
I tried following the suggestions of both answers. For the first one, I now see a file called "configoverlay.json" in my ./conf directory and it has those settings. For the second answer, I set it up so my requestParsers node had those attributes. However, neither worked. I've searched around but at this point I'm at my wits end. How can I make it so that I can continue using "stream.body"? If I shouldn't be using "stream.body" is there some other request var that I can/should use when sending my data? I couldn't find anything in the documentation. Perhaps I was looking in the wrong place?
Any help would be greatly appreciated.
thnx,
Christoph

Error while indexing documents in solr - SolrException

I am using the following code to index documents in solr server.
String urlString = "http://localhost:8080/solr";
SolrServer solr = new CommonsHttpSolrServer(urlString);
java.io.File file=new java.io.File("C:\\Users\\Guruprasad\\Desktop\\Search\\47975832.doc");
if (file.canRead()) {
System.out.println("adding " + file);
try {
ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/extract");
String parts[] = file.getName().split("\\.");
String type = "text";
if (parts.length>1) {
type = parts[1];
}
req.addFile(file);
req.setParam("literal.id", file.getAbsolutePath());
req.setParam("literal.name", file.getName());
req.setParam("literal.content_type", type);
req.setParam("uprefix", "attr_");
req.setParam("fmap.content", "attr_content");
req.setAction(ACTION.COMMIT, true, true);
solr.request(req);* //**Line no 36** here i am getting exception
While executing this code i am getting following exception.
Exception: org.apache.solr.common.SolrException
Exception message:
Internal Server Error Internal Server Error request:
http://localhost:8080/solr/update/extract?literal.id=C:\Users\Guruprasad\Desktop\Search\47975832.doc&literal.name=47975832.doc&literal.content_type=doc&uprefix=attr_&fmap.content=attr_content&commit=true&waitFlush=true&waitSearcher=true&wt=javabin&version=2
Exception trace:
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at com.solr.search.test.IndexFiles.indexDocs(IndexFiles.java:36)*
Any help will be useful

i dont't suggest you use dih to index your database data, you can use solrj to index your data , solrj is simple , if you can use jdbc , then things is simple , you can use solrj build solr document and batch data commit to solr server . there are a solrj wiki , hope it can help you solrj wiki

solr 5.0 comes with inbuilt utility DIH handler for indexing data from database which you are using but its configuration is important and tricky could you please post your configuration of DIH handler or share logs of import , it looks like configuration problem to me

nutch Unable to successfully parse content

I try to crawl using nutch 1.4 , but I'm facing error in parsing, this is the log file:
2012-01-09 09:12:02,696 INFO parse.ParseSegment - ParseSegment: starting at 2012-01-09 09:12:02
2012-01-09 09:12:02,697 INFO parse.ParseSegment - ParseSegment: segment: crawl/segments/20120109091153
2012-01-09 09:12:03,416 WARN parse.ParseUtil - Unable to successfully parse content http://sujitpal.blogspot.com/ of type application/xhtml+xml
2012-01-09 09:12:03,417 INFO parse.ParseSegment - Parsing: http:// sujitpal.blogspot.com/
2012-01-09 09:12:03,418 WARN parse.ParseSegment - Error parsing: http://sujitpal.blogspot.com/: failed(2,200): org.apache.nutch.parse.ParseException: Unable to successfully parse content
2012-01-09 09:12:03,419 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature
by checking config/nutch-site.xml I found html|text|xhtml|xml are included in the plugin.includes preperty
<property>
<name>plugin.includes</name>
<value>myplugins|protocol-httpclient|query-(basic|site|url)|summary-
basic|urlfilter-
regex|parse-(xml|xhtml|html|tika|text|js)|index-(basic|anchor)|scoring-
opic|urlnormalizer-(pass|regex|basic)|query-(basic|site|url)|response-(json|xml)
</value>
<description>Regular expression naming plugin directory names to
include. Any plugin not matching this expression is excluded.
In any case you need at least include the nutch-extensionpoints plugin. By
default Nutch includes crawling just HTML and plain text via HTTP,
and basic indexing and search plugins. In order to use HTTPS please enable
protocol-httpclient, but be aware of possible intermittent problems with the
underlying commons-httpclient library.
</description>
</property>
Why can't it parse xhtml/xml or even text/xml?

Which plugins have you configured? If you are using tika, then tika has a mapping from mime-type like xhtml/xml to a parser. If there is no entry in the configfile, nothing happens.
You could disable tika and only use the parse-html plugin.
I tested your site with our default plugin config.
protocol-http|urlfilter-regex|parse-(html)|index-(basic|anchor)
|query- (basic|site|url)|response-(json|xml)
|summary-basic|scoring-opic|urlnormalizer-
(pass|regex|basic)
And got your page parsed.
Parsed (32ms):http://sujitpal.blogspot.com/
Grettings
JPee

Querying Google AppEngine's Datastore using PHP (through Quercus) and low-level API isn't working

When I query Google AppEngine's datastore using PHP(through Quercus) and the low-level data-access API for an entity, I get an error that the entity doesn't exist, even though I've put it in the datastore previously.
The specific error is "com.caucho.quercus.QuercusException: com.google.appengine.api.datastore.DatastoreService.get: No entity was found matching the key: Test(value1)"
Here's the relevant code -
<?php
import com.google.appengine.api.datastore.DatastoreService;
import com.google.appengine.api.datastore.DatastoreServiceFactory;
import com.google.appengine.api.datastore.Entity;
import com.google.appengine.api.datastore.EntityNotFoundException;
import com.google.appengine.api.datastore.Key;
import com.google.appengine.api.datastore.KeyFactory;
import com.google.appengine.api.datastore.PreparedQuery;
import com.google.appengine.api.datastore.Query;
$testkey = KeyFactory::createKey("Test", "value1");
$ent = new Entity($testkey);
$ent->setProperty("field1", "value2");
$ent->setProperty("field2", "value3");
$dataService = DatastoreServiceFactory::getDatastoreService();
$dataService->put($ent);
echo "Data entered";
try
{
$ent = $dataService->get($testkey);
echo "Data queried - the results are \n";
echo "Field1 has value ".$ent->getProperty("field1")."\n";
echo "Field2 has value ".$ent->getProperty("field2")."\n";
}
catch(EntityNotFoundException $e)
{
echo("<br/>Entity test not found.");
echo("<br/>Stack Trace is:\n");
echo($e);
}
And here's the detailed stack-trace - link.
This same code runs fine in Java (of course after changing the syntax). I wonder what's wrong.
Thanks.

I have found the solution to my problem. It was caused by missing dependencies and I solved it by using the prepackaged PHP Wordpress application available here.
One thing is to be noted. The package overlooked a minor issue in that all files other than the src/ directory need to be in a war/ directory which stays alongside the src/ directory (this as per appengine conventions as mentioned on its documentation). So I organized the files thus myself, put the above PHP file in the war/ directory, and it's working fine on the appengine.