Faceting in solr work on the result returned by query, but I want it to be done on top of documents which are returned after limiting rows to some value.For example if q return 1500 document and I am taking first 1000 rows, I want faceting to be applied on 1000 instead of 1500.I am writing a custom search component to generate facets. Following is the implementation:
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.TreeMap;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexableField;
import org.apache.solr.handler.component.ResponseBuilder;
import org.apache.solr.handler.component.SearchComponent;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.search.DocIterator;
import org.apache.solr.search.DocList;
import org.apache.solr.search.SolrIndexSearcher;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.biginfolabs.subfacet.MapSorter;
public class SubFacetComponent extends SearchComponent {
private static Logger logger = LoggerFactory
private SolrQueryRequest req = null;
private long numDoc = 0;
// private static final String field = "annotations";
public String getDescription() {
// TODO Auto-generated method stub
return null;
public void prepare(ResponseBuilder arg0) throws IOException {
// TODO Auto-generated method stub
public void process(ResponseBuilder rb) throws IOException {
// TODO Auto-generated method stub
long startTime = System.currentTimeMillis();
req = rb.req;
String[] fields = {"annotations"};
// req.getParams().getParams("facet.field");
logger.info("Facet component");
if (rb.getResults() != null) {
DocList docs = rb.getResults().docList;
numDoc = docs.size();
for (String field : fields) {
DocIterator docItr = docs.iterator();
SolrIndexSearcher searcher = req.getSearcher();
Map<String, List<String>> map = new HashMap<String, List<String>>();
List<String> list = new ArrayList<String>();
while (docItr.hasNext()) {
int docId = docItr.nextDoc();
Document doc = searcher.doc(docId);
Set<String> keySet = new HashSet<String>();
if (doc.get(field) != null) {
IndexableField[] indexableFields = doc.getFields(field);
for (IndexableField indexableField : indexableFields) {
String str = indexableField.stringValue();
String[] pathVariableArray = str.split("/");
String key = "";
String separator = "";
for (String split : pathVariableArray) {
key += separator + split;
separator = "/";
for (String str : keySet) {
List<String> arrayList = new ArrayList<String>();
if (map.containsKey(str)) {
arrayList = map.get(str);
map.put(str, arrayList);
Map<String, Integer> finalMap = new HashMap<String, Integer>();
for (String key : map.keySet()) {
if (map.containsKey(key)) {
finalMap.put(key, map.get(key).size());
System.out.println(key + ": " + map.get(key).size());
Map<String, TreeMap<String, Integer>> facetMap = new HashMap<String, TreeMap<String, Integer>>();
TreeMap<String, Integer> sortedMap = new TreeMap(new MapSorter(
rb.rsp.add("subfacet", sortedMap);
} else {
logger.warn("You must specify 'subfacet' params in solr query !!");
long enTime = System.currentTimeMillis();
logger.info("Time taken to generate facets for " + numDoc
+ " documents is " + (enTime - startTime) + " ms");
I am trying to add the response using rb.rsp.add("subfacet", sortedMap);, which seems to be setting the subfacet in rsp object but the response returned to solr UI doesn't contain this object.What am I missing here ?
Following is my select request handler:
<requestHandler name="/select" class="solr.SearchHandler">
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<arr name="last-components">
Edit: It works fine if solr is used as single node and single shard but not on cloud mode.
fixed it by overriding and writing logic in public void finishStage(ResponseBuilder rb) instead of process().
I have tried to create a CustomQueryParser where I am making use of OpenNLP libraries as well.
My objective is if i have a query "How many defective rims are causing failure in ABC tyres in China"
I want the final query to be something like "defective rims failure tyres China"
which then would go to the Analyzer for further processing.
This is my code for QueryParserPlugin -
package com.mycompany.lucene.search;
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.search.QParser;
import org.apache.solr.search.QParserPlugin;
import com.mycompany.lucene.search.QueryParser;
public class QueryParserPlugin extends QParserPlugin {
public QParser createParser(String qstr, SolrParams localParams,
SolrParams params, SolrQueryRequest req) {
return new QueryParser(qstr, localParams, params, req, "body_txt_str");
And the code for my QueryParser -
package com.mycompany.lucene.search;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.PhraseQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.search.QParser;
import org.apache.solr.search.SyntaxError;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.Tokenizer;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
public class QueryParser extends QParser {
private String fieldName;
public QueryParser(String qstr, SolrParams localParams, SolrParams params,
SolrQueryRequest req,
String defaultFieldName) {
super(qstr, localParams, params, req);
fieldName = localParams.get("field");
if (fieldName == null) {
fieldName = params.get("df");
public Query parse() throws SyntaxError {
Analyzer analyzer = req.getSchema().getQueryAnalyzer();
InputStream tokenModelIn = null;
InputStream posModelIn = null;
try {
tokenModelIn = new FileInputStream("/Files/en-token.bin");
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
TokenizerModel tokenModel = null;
try {
tokenModel = new TokenizerModel(tokenModelIn);
} catch (IOException e) {
// TODO Auto-generated catch block
Tokenizer tokenizer = new TokenizerME(tokenModel);
String tokens[] = tokenizer.tokenize(qstr);
try {
posModelIn = new FileInputStream("/Files/en-pos-maxent.bin");
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
// loading the parts-of-speech model from stream
POSModel posModel = null;
try {
posModel = new POSModel(posModelIn);
} catch (IOException e) {
// TODO Auto-generated catch block
// initializing the parts-of-speech tagger with model
POSTaggerME posTagger = new POSTaggerME(posModel);
// Tagger tagging the tokens
String tags[] = posTagger.tag(tokens);
String final_query = "";
for(int i=0;i<tokens.length;i++){
if (tags[i]=="JJ" || tags[i]=="NNS" || tags[i]=="NN") {
final_query = final_query + " " +tokens[i];
TermQuery tq= new TermQuery(new Term(fieldName,final_query));
return tq;
I then exported this as a jar and added these jars to my solrconfig.xml -
<lib dir="${solr.install.dir:../../../..}/contrib/customparser/lib"
regex=".*\.JAR" />
<lib dir="${solr.install.dir:../../../..}/contrib/analysis-extras/lib"
regex="opennlp-.*\.jar" />
But getting the below error :
Caused by:
java.lang.NoClassDefFoundError: opennlp/tools/tokenize/Tokenizer
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:541)
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:488)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:786)
at org.apache.solr.core.PluginBag.createPlugin(PluginBag.java:135)
at org.apache.solr.core.PluginBag.init(PluginBag.java:271)
at org.apache.solr.core.PluginBag.init(PluginBag.java:260)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:957)
... 9 more
This is my first time creating a CustomQueryParser, Could you please help me out.
most probably your path
doesn't contain the relevant opennlp jars or the regex is not appropriate.
that's the first thing to check.
you have to either "bundle" also the opennlp dependencies in your custom query parser jar (e.g. if you use maven to build your project, using maven-assembly-plugin, maven-shade-plugin, etc.) or make sure the opennlp specific jars in the relevant directive in your solrconfig.xml are matched.
I'm having trouble uploading entities to the Cloud Datastore via the Apache Beam Java SDK (2.1.0). The following is my code:
import com.google.cloud.datastore.DatastoreOptions
import com.google.cloud.datastore.Entity
import com.opencsv.CSVParser
import org.apache.beam.runners.dataflow.DataflowRunner
import org.apache.beam.sdk.Pipeline
import org.apache.beam.sdk.io.TextIO
import org.apache.beam.sdk.io.gcp.datastore.DatastoreIO
import org.apache.beam.sdk.options.PipelineOptionsFactory
import org.apache.beam.sdk.transforms.DoFn
import org.apache.beam.sdk.transforms.MapElements
import org.apache.beam.sdk.transforms.ParDo
import org.apache.beam.sdk.transforms.SimpleFunction
import java.io.Serializable
object PipelineClass {
class FoodGroup(var id: String? = null,
var group: String? = null) : Serializable
class CreateGroupsFn : SimpleFunction<String, FoodGroup>() {
override fun apply(line: String?): FoodGroup {
val group = FoodGroup()
val parser = CSVParser()
val parts = parser.parseLine(line)
group.id = parts[0].trim()
group.group = parts[1].trim()
return group
class CreateEntitiesFn : DoFn<FoodGroup, Entity>() {
fun processElement(c: ProcessContext) {
val datastore = DatastoreOptions.getDefaultInstance().service
val keyFactory = datastore.newKeyFactory()
val key = datastore.allocateId(keyFactory.newKey())
val entity = Entity.newBuilder(key)
.set("id", c.element().id)
.set("group", c.element().group)
#JvmStatic fun main(args: Array<String>) {
val options =
options.runner = DataflowRunner::class.java
options.project = "simplesample"
options.jobName = "fgUpload"
val pipeline = Pipeline.create(options)
//error occurs below...
The following is the error I get:
PipelineClass.kt: (75, 24): Type mismatch: inferred type is
DatastoreV1.Write! but PTransform<in PCollection<Entity!>!, PDone!>!
was expected
I've tried SimpleFunction, DoFn, and PTransform (composite and non-composite) to do the transform from String to Entity with no success.
What am I doing wrong?
EDIT: I've finally managed to get my entities in the Datastore. I decided to use Dataflow 1.9.1 and ditched Beam (2.1.0) after seeing this example. Below is my code:
import com.google.cloud.dataflow.sdk.Pipeline;
import com.google.cloud.dataflow.sdk.io.TextIO;
import com.google.cloud.dataflow.sdk.io.datastore.DatastoreIO;
import com.google.cloud.dataflow.sdk.options.Default;
import com.google.cloud.dataflow.sdk.options.Description;
import com.google.cloud.dataflow.sdk.options.PipelineOptions;
import com.google.cloud.dataflow.sdk.options.PipelineOptionsFactory;
import com.google.cloud.dataflow.sdk.transforms.DoFn;
import com.google.cloud.dataflow.sdk.transforms.ParDo;
import com.google.datastore.v1.Entity;
import com.google.datastore.v1.Key;
import com.opencsv.CSVParser;
import javax.annotation.Nullable;
import java.util.UUID;
import static com.google.datastore.v1.client.DatastoreHelper.makeKey;
import static
public class PipelineClass {
static class CreateEntitiesFn extends DoFn<String, Entity> {
private final String namespace;
private final String kind;
private final Key ancestorKey;
CreateEntitiesFn(String namespace, String kind) {
this.namespace = namespace;
this.kind = kind;
ancestorKey = makeAncestorKey(namespace, kind);
Entity makeEntity(String id, String group) {
Entity.Builder entityBuilder = Entity.newBuilder();
Key.Builder keyBuilder = makeKey(ancestorKey, kind,
if (namespace != null) {
return entityBuilder.build();
public void processElement(ProcessContext c) throws Exception {
CSVParser parser = new CSVParser();
String[] parts = parser.parseLine(c.element());
String id = parts[0];
String group = parts[1];
c.output(makeEntity(id, group));
static Key makeAncestorKey(#Nullable String namespace, String kind) {
Key.Builder keyBuilder = makeKey(kind, "root");
if (namespace != null) {
return keyBuilder.build();
public interface Options extends PipelineOptions {
#Description("Path of the file to read from and store to Cloud
String getInput();
void setInput(String value);
#Description("Dataset ID to read from Cloud Datastore")
String getProject();
void setProject(String value);
#Description("Cloud Datastore Entity Kind")
String getKind();
void setKind(String value);
#Description("Dataset namespace")
String getNamespace();
void setNamespace(#Nullable String value);
#Description("Number of output shards")
int getNumShards();
void setNumShards(int value);
public static void main(String args[]) {
Options options =
Pipeline p = Pipeline.create(options);
CreateEntitiesFn(options.getNamespace(), options.getKind())))
I am New to MULE ESB I am tryting to handle file attachment with http listener using rest web service.
I am created a simple flow but dont know how to handle attachment in mule to pass rest ful web service .
Any help greatly appreciated!!
Given is simple flow waht i am assuming to be work !!
rest web service code ::
package com.one.file;
import java.io.File;
import java.util.Iterator;
import java.util.List;
import javax.servlet.http.HttpServletRequest;
import javax.ws.rs.Consumes;
import javax.ws.rs.POST;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import javax.ws.rs.core.Context;
import javax.ws.rs.core.MediaType;
import org.apache.commons.fileupload.FileItem;
import org.apache.commons.fileupload.FileItemFactory;
import org.apache.commons.fileupload.FileUploadException;
import org.apache.commons.fileupload.disk.DiskFileItemFactory;
import org.apache.commons.fileupload.servlet.ServletFileUpload;
public class RESTMultipleFileUpload {
private static final String FILE_UPLOAD_PATH = "C:\\Users\\charan\\Documents\\webservice\\";
//private static final String CANDIDATE_NAME = "candidateName";
private static final String SUCCESS_RESPONSE = "Successful";
private static final String FAILED_RESPONSE = "Failed";
public String registerWebService(#Context HttpServletRequest request)
String responseStatus = SUCCESS_RESPONSE;
String candidateName = null;
System.out.println("first ");
//checks whether there is a file upload request or not
if (ServletFileUpload.isMultipartContent(request))
final FileItemFactory factory = new DiskFileItemFactory();
final ServletFileUpload fileUpload = new ServletFileUpload(factory);
System.out.println("t ");
* parseRequest returns a list of FileItem
* but in old (pre-java5) style
final List items = fileUpload.parseRequest(request);
if (items != null)
final Iterator iter = items.iterator();
while (iter.hasNext())
final FileItem item = (FileItem) iter.next();
final String itemName = item.getName();
final String fieldName = item.getFieldName();
final String fieldValue = item.getString();
if (item.isFormField())
candidateName = fieldValue;
System.out.println("Field Name: " + fieldName + ", Field Value: " + fieldValue);
System.out.println("Candidate Name: " + candidateName);
final File savedFile = new File(FILE_UPLOAD_PATH + File.separator
+ itemName);
System.out.println("Saving the file: " + savedFile.getName());
catch (FileUploadException fue)
responseStatus = FAILED_RESPONSE;
catch (Exception e)
responseStatus = FAILED_RESPONSE;
System.out.println("Returned Response Status: " + responseStatus);
return responseStatus;
I 've a problem with my huge nquad file (about 4000 lines) when i execute a boolenquery,
i try a query as:
Query query1 = new TermQuery(new Term(FIELD_CONTENTS, "Albania"));
Query query2 = new TermQuery(new Term(FIELD_CONTENTS, "Hitchcock"));
BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.add(query1, BooleanClause.Occur.MUST);
booleanQuery.add(query2, BooleanClause.Occur.MUST);
This query performs correctly when the words that I try to search in the line number<780, then >780 failed.
This is a snippet of my nquad file:
<http://dbpedia.org/resource/A_Clockwork_Orange> <http://dbpedia.org/ontology/numberOfPages> "192"^^<http://www.w3.org/2001/XMLSchema#positiveInteger> <http://en.wikipedia.org/wiki/A_Clockwork_Orange?oldid=606117686#absolute-line=12> .
I make a custom analyzer for distinguer tokens:
import java.io.Reader;
import java.util.Set;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.StopFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.standard.StandardFilter;
import org.apache.lucene.analysis.standard.StandardTokenizer;
class TestAnalyzer1 extends Analyzer {
public static final String[] TEST_STOP_WORDS = { "http", "https",
"resource", "foaf/0.1", "dbpedia.org", "en.wikipedia.org",
"xmlns.com", "purl.org", "elements/1.1",
"www.w3.org/2001/XMLSchema", "www.w3.org/1999/02/22-rdf",
"www.w3.org/2003/01", "oldid", "wiki" };
private Set stopWords = StopFilter.makeStopSet(TEST_STOP_WORDS);
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream ts = new StandardTokenizer(reader);
ts = new StandardFilter(ts);
ts = new StopFilter(ts, stopWords);
return ts;
This is main class:
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.util.Iterator;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermFreqVector;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.Hit;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;
public class TestPreFinal {
public static final String FILES_TO_INDEX_DIRECTORY = "filesToIndex_1";
public static final String INDEX_DIRECTORY = "indexDirectory";
public static final String FIELD_PATH = "path";
public static final String FIELD_CONTENTS = "contents";
public static void main(String[] args) throws CorruptIndexException,
LockObtainFailedException, IOException, ParseException {
long startTime = System.currentTimeMillis();
Analyzer analyzer = new TestAnalyzer1();
IndexWriter indexWriter = new IndexWriter(INDEX_DIRECTORY, analyzer,
File dir = new File(FILES_TO_INDEX_DIRECTORY);
File[] files = dir.listFiles();
for (File file : files) {
Reader reader = new FileReader(file);
Document document = new Document();
String path = file.getCanonicalPath();
Field fieldPath = new Field(FIELD_PATH, path, Field.Store.YES,
Field fieldContents = new Field(FIELD_CONTENTS, reader,
Directory directory = FSDirectory.getDirectory(INDEX_DIRECTORY);
IndexSearcher indexSearcher = new IndexSearcher(directory);
IndexReader indexReader = IndexReader.open(directory);
Query query1 = new TermQuery(new Term(FIELD_CONTENTS, "Albania"));
Query query2 = new TermQuery(new Term(FIELD_CONTENTS, "Hitchcock"));
BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.add(query1, BooleanClause.Occur.MUST);
booleanQuery.add(query2, BooleanClause.Occur.MUST);
Hits hits = indexSearcher.search(booleanQuery);
#SuppressWarnings({ "unchecked" })
Iterator<Hit> it = hits.iterator();
TermFreqVector tfv = null;
while (it.hasNext()) {
Hit hit = it.next();
Document document = hit.getDocument();
String path = document.get(FIELD_PATH);
System.out.println("Hit: " + path);
for (int i = 0; i < hits.length(); i++) {
tfv = indexReader.getTermFreqVector(i, FIELD_CONTENTS);
I do not know what else to do. You can help please. Thanks in advance.
I am using solr4.0 in jetty server. I want to query solr using solrj and expecting results to be formatted in XML. So i used HttpSolrServer (CloudSolrServer and LBHttpSolrServer does not provide support for setting parser) and i set parser to Xmlparser. Moreover i am also setting SolrQuery param wt=xml.But i am not able to get results in XML.Here is my test code
package solrjtest;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.UUID;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.impl.XMLResponseParser;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrDocumentList;
class SolrjTest
public static void main(String[] args) throws IOException, SolrServerException
SolrjTest solrj = new SolrjTest();
public void query(String q) throws IOException, SolrServerException
CommonsHttpSolrServer server = null;
String uuid = null;
boolean flag = true;
while (flag == true)
uuid = UUID.randomUUID().toString();
File f = new File("D:/SearchResult/" + uuid + ".txt");
if (!f.exists())
server = new CommonsHttpSolrServer("http://skyfall:8983/solr/documents");
server.setParser(new XMLResponseParser());
catch (Exception e)
SolrQuery query = new SolrQuery();
query.setParam("wt", "xml");
FileWriter fw = new FileWriter("D:/SearchResult/" + uuid + ".txt");
QueryResponse qr = server.query(query);
SolrDocumentList sdl = qr.getResults();
XMLResponseParser r = new XMLResponseParser();
Object[] o = new Object[sdl.size()];
o = sdl.toArray();
for (int i = 0; i < o.length; i++)
fw.write(o[i].toString() + "\n");
catch (SolrServerException e)
Any idea whats going wrong here ?
With that setup, the Solr server at the machine skyfall does send the response in XML and the CommonsHttpSolrServer wrapper does correctly parse the XML. However, that does not change the internal representation in the QueryResponse, which is just a thin wrapper around the Solr class NamedList.
You can (mis)use the XMLResponseWriter to get an XML representation of the full QueryResponse:
private String toXML(SolrParams request, QueryResponse response) {
XMLResponseWriter xmlWriter = new XMLResponseWriter();
Writer w = new StringWriter();
SolrQueryResponse sResponse = new SolrQueryResponse();
try {
xmlWriter.write(w, new LocalSolrQueryRequest(null, request), sResponse);
} catch (IOException e) {
throw new RuntimeException("Unable to convert Solr response into XML", e);
return w.toString();