solr search over ms sql server - an updated tutorial needed - sql-server

I am trying to config solr over ms sql server.
I found only this tutorial which is a bit old (2011)
Is there an updated tutorial?
Is there a formal tutorial?

Steps to configure solr on Tomcat
http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/
Everything about data import handler can be found here..
http://wiki.apache.org/solr/DataImportHandler
After creating the dataimport handler (ex file name data-config.xml) you need to add this request handler to solrconfig.xml as below
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">/home/username/data-config.xml</str>
</lst>
</requestHandler>
Sample ms sql DIH configuration:
data-config.xml:
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
url="jdbc:sqlserver://myserver;databaseName=mydb;responseBuffering=adaptive;selectMethod=cursor"
user="sa"
password="password"/>
<document>
<entity name="results" query="SELECT statements">
<field column="fielda" name="fielda"/>
<field column="fieldb" name="fieldb"/>
<field column="fieldc" name="fieldc"/>
</entity>
</document>
</dataConfig>

Related

solr 6.0 disable ocr for a specific core

I'm trying to disable OCR for a specific core.
i have solr 6.0 and I configured /update/extract the following way:
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="tika.config">C:\solr-6.0.0\contrib\extraction\lib\tika.config</str>
</lst>
</requestHandler>
and file tika.config contains:
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser">
<parser-exclude class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
</parser>
</parsers>
</properties>
It doesn't work. I can't disable OCR. Any idea how to disable? (I have tesseract installed and I need it. Please don't suggest to remove it).
Thanks !

Share config files across environments in Solr

We have several environments (incl. several development, staging and production), but currently copy and paste the Solr conf folders and setup solr-data-config.xml for each environment as the file has the environments details:
<dataConfig>
<dataSource name="ds-db" type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://10.0.0.40:3306/***"
user="***"
password="**"/>
How can we separate the solr config from the environment data, so that we just have one config folder per search group and have separate environment data?
I would recommend to externalise the environment dependent parameters :
1) DIH
You can obtain this using placeholders :
e.g.
<dataConfig>
<dataSource name="ds-db" type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url ="${dataimporter.request.url}"
user ="${dataimporter.request.user}"
password ="${dataimporter.request.password}"/>
2) Solrconfig
<requestHandler name="/dataimport" class="solr.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
<str name="clean">true</str>
...
<str name="url">${db.url:defaultUrl}</str>
<str name="user">${db.user:defaultUser}</str>
<str name="password">${db.password:}</str>
...
</lst>
</requestHandler>
${environment_variable: "default" } is the syntax to use[1] .
Then you need to pass the variables as Java system properties for the Solr java process.
[1] https://lucene.apache.org/solr/guide/6_6/configuring-solrconfig-xml.html#Configuringsolrconfig.xml-JVMSystemProperties

Cassandra and Solr integration: Unable to execute query

I'm trying to integrate cassandra and solr.
I'm using solr -6.6.0 version, cassandra 3.10 version and java 8.
To my solrconfig.xml I added these lines
<lib dir="/home/bkoganti/solr-6.6.0/contrib/dataimporthandler/" regex="cassandra-jdbc-.*\.jar"/>
<lib dir="/home/bkoganti/solr-6.6.0/contrib/dataimporthandler/" regex="cassandra-all-.*\.jar"/>
<lib dir="/home/bkoganti/solr-6.6.0/contrib/dataimporthandler/" regex="cassandra-thrift-.*\.jar"/>
<lib dir="/home/bkoganti/solr-6.6.0/contrib/dataimporthandler/" regex="libthrift-.*\.jar"/>
<lib dir="/home/bkoganti/solr-6.6.0/contrib/dataimporthandler/" regex="cassandra-driver-core-*\.jar"/>
.
.
.
.
<requestHandler name="/dataimport" class="solr.DataImportHandler">
<lst name="defaults">
<str name="config">sample-data-config.xml</str>
</lst>
</requestHandler>`
sample-data-config.xml
<dataConfig>
<dataSource type="JdbcDataSource" driver="org.apache.cassandra.cql.jdbc.CassandraDriver" url="jdbc:cassandra://127.0.0.1:9160/demo" autoCommit="true"/>
<document name="content">
<entity name="test" query="SELECT id,org,name,dep,place,sal from tutor" autoCommit="true">
<field column="id" name="id" />
<field column="org" name="org" />
<field column="name" name="name" />
<field column="dep" name="dep" />
<field column="place" name="place" />
<field column="sal" name="sal" />
</entity>
</document>
</dataConfig>
to managed schema I added these
<field name="org" type="string" indexed="true" stored="true" required="true" />
<field name="dep" type="string" indexed="true" stored="true" required="true" />
<field name="place" type="string" indexed="true" stored="true" required="true" />
<field name="sal" type="string" indexed="true" stored="true" required="true" />
On running solr and trying to import data from sample core, I'm unable to import. I keep getting this error.
I'm unable to figure out where I'm wrong could someone help me out. Thanks in advance.
The last version of cassandra being supported by JDBC is 1.2.5. Later on, Datatstax developed the required drivers for Cassandra to get connected with JAVA applications. However, the datastax drivers cannot be used with solr as they have their own DSE search engine.
This jdbc driver available is compatible with the latest versions of cassandra. Using this JDBC I could integrate.
And also this JDBC works.
that error means solr cannot reach cassandra via jdbc. First, you should check you can connect to cassandra from the Solr host, with java, by using some db tool like squirrelSQL or something similar.
Once you verified you can access it this way, move to solr. But you have something preventing it (a firewall, some wrong port, who knows...)
use nodetool enablethrift for cassandra . your error is clearly saying unable to connect . connection refused .. after you enable thrift check it using nodetool info . hope this solves your problem .

Errors while trying to configure Solr 5.3.1 on Windows 10

I'm trying to setup a very basic configuration of Solr, to read some text from a mysql table and index it. I'm following the steps in DIH Quick Start document.
The document doesn't tell you where to place solrconfig.xml.
At first I tried placing it under the solr5.3.1 folder (next to bin). That failed. Then I noticed the "add core" button was looking for it in server\solr\new_core. So I put it there, but then got this other error:
My data import handler looks like this:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
And here's data-config.xml:
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/ctcrets"
user="root"
password="xxxx"/>
<document>
<entity name="id"
query="select RETS_STAGE1_QUEUE_ID as id, LN_LIST_NUMBER as name, xmlText as desc from RETS_STAGE1_QUEUE">
</entity>
</document>
</dataConfig>
What could be the problem?
The document assumes you already know the solr.home [1] directory structure. On top of that, I think it assumes you started the sample Solr instance (e.g. ./solr start -p 8984) where everything should be already set.
Once started you can see on the dashboard where the configuration is exactly located. Go there, change the files as suggested and RELOAD the core through the admin console (CoreAdmin). If you want you can also do a stop / restart.
As side notes:
the DIH is not part of the Solr core, so you should put some "lib" directive within the solrconfig.xml, as far as I remember, the sample config already has those directives so you don't need to "import" the DIH lib
the JDBC driver that allows the connection with the database is not included so your classpath (i.e. JVM or Solr classpath - through the same lib directive) must include this additional lib(s).
[1] http://www.solrtutorial.com/configuring-solr.html

Configure DataImportHandler in SolrCloud with ZooKeeper

I have a SolrCloud configured like this: exploration of SolrCloud, the difference is that I use Solr 4.0.0 Beta. Shortly the configuration:
ZooKeeper on default port 2181
3 instances of Solr running on different ports
This is just for testing purpose. The desired configuration is with 3 ZooKeeper instances (one for every Solr instance). I manage to index some XML files with curl command.
Questions:
How can I configure DIH/collection? I managed to change the solrconfig.xml (config for dataimport-handler), add in lib the proper driver for DB connection, but in solr admin I get "sorry, no dataimport-handler defined!" The changes can be watched in zookeeper (I see the data_config.xml) and in solr admin panel I can see the updated version of solrconfig.xml.
Any good tutorial for a production deploy of solrcloud (with somthink like the desired configuration mentioned before) on single or multiple machine for Ubuntu 12.04 LTS?
Any advice would be appreciated! Thanks in advance!
Normally DIH config has nothing to do with wether you're using a single Solr instance or multiple instances in a solrCloud config. DIH will write data in the current instance's Lucene index, and then it's up to zooKeeper to speread it around on the other instances.
Make sure your DIH is propertly configured:
In solrconfig.xml, all necessary libraries are loaded. This means the two DIH jars:
<lib dir="../../../dist/" regex="solr-dataimporthandler-4.3.0.jar" />
<lib dir="../../../dist/" regex="solr-dataimporthandler-extras-4.3.0.jar" />
as well as others jars you may need (like Database JDBC driver, etc).
Still in solrconfig.xml make sure the DIH handler is declared, something like this:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
Finally, the config file you declared in the DIH handler (data-config.xml) should be in the same "conf" dir as solrconfig.xml and should have proper content, something like:
<dataConfig>
<dataSource type="JdbcDataSource" name="myDataSource" driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:#someHost:1521:someDb" user="someUser" password="somePassword" batchSize="5000"/>
<document name="myDoc" >
<entity name="myDoc" dataSource="myDatasource" transformer="my.custom.Transformer" query="select col1, col2, col3 from table1 where whatever" />
</document>
</dataConfig>

Resources