Integration of Solr with EMC Documentum - solr

We have bunch of pdf documents available in EMC Documentum
We have a requirement we have to integrate Apache solr with Documentum, so that we can search for a specific document in Solr, and we can get the documents from Documentum
I looked into below link which is not sufficient information
https://community.emc.com/docs/DOC-6520
Help is really appriciated

The link you have posted would get you a working solution. That author proposes to write a custom crawler that connects to the Documentum repository and then use Apache Tika to perform the content extraction for Solr.
However I would suggest you to use
Apache ManifoldCF to act as crawler that gets the content from Documentum to Solr. You should not write this by hand, as it already has been done and tested.
Apache ManifoldCF is an effort to provide an open source framework for connecting source content repositories like Microsoft Sharepoint and EMC Documentum, to target repositories or indexes, such as Apache Solr, Open Search Server, or ElasticSearch. Apache ManifoldCF also defines a security model for target repositories that permits them to enforce source-repository security policies.
Apache Tika to perform the content extraction (PDF to text) so that the content of the documents is searchable in Solr later on.
The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

I have built my own connecter to extract data from Documentum and insert in Elasticsearch or solr and I am willing to share. please contact me

Related

Choose Lucene or Solr

We need to integrate a search engine in our plataform Catalog management software in Share point. The information is stored in multiple databases and a storage of files ( doc , ppt , pdf .....). Our dev platform is Asp.Net and we have done some pre-liminary work on Lucene, found it to be good. However, we just came to know of Solr.
We need to continue using lucene, but we need to defend her the solr.
Please any help is accepted.
And sorry for my english.
Lucene is a full-text search library used to provide search functionalities to an application. It can't be used as an application by itself. Solr is a complete search engine built around Lucene providing its search functionalities and others. Solr is a web application that can be used by itself without any development around it.
If you need a search engine to be called by your application I recommend you to use Solr.

Solr instance for sitecore 7

I am trying to implement solr into sitecore but could not find any way for creating a Solr instance for the same. I have few PDFs from SDN I could find any way to create Solr instance in any. Considering that I am new to CMS I hope I could get some help here. Thank you
There are lots of resources available for setting up Solr, and integrating Sitecore.
Essentially Sitecore is ignorant with respects to how you setup Solr (barring a few exceptions), so you need to follow standard methods to set Solr up. If you are doing this on your local machine, then I recommend you simply download Solr and get it running through the provided Jetty App Server.
Once Solr is running, download the Solr Extensions from SDN, then follow the search scaling guide to integrate Solr. This really only boils down to the following;
Remove Lucene config files
Add Solr config files and binaries
Add Solr endpoint into relevant config
Generate Solr Schema via Sitecore -> Control Panel -> Search (within Sitecore)
Add Schema file to Solr Core configuration
et voila
There is a great guide here: http://www.dansolovay.com/2013/05/setting-up-solr-with-sitecore-7.html

Can we use Kibana for Apache Solr not using elasticsearch

How to integrate Kibana with Apache solr instead of using elastic search.
If it cannot be done.
What are the alternatives to Kibana for Solr
At LucidWorks, we have ported Kibana to work with Solr and released it as open source.
If you want a bundled package, you can download that at http://www.lucidworks.com/lucidworks-silk/.
Our port for Kibana for Solr is bundled with Solr 4.7.0 and can be used as a query engine to build dashboards from indexes within the bundled Solr instance and/or located on other Solr instances.
The source code is available at https://github.com/LucidWorks/banana.
We have also included Solr Output Writer for LogStash with that bundle; however, you can use any ETL and indexing mechanism to get time series data into Solr. Links to this github repository are available on the LucidWorks link above.
HUE is an alternative search UI for Solr, while it is not good as Kibana for search at the moment.
You can use SiLK for sure but you are better off using the fully integrated dashboards module that comes with Lucidworks Fusion. Fusion will save you a ton of time and make it easier to focus on the search stuff that matters - like building a recommender engine, creating data-driven user experience, driving data enrichment with entity recognition and integrating with Big Data software like Hadoop.

How to do indexing data from database using apache solr with glassfish server on linux?

I want to create a search box in my web app using Apache Lucene and Apache Solr.I am using postgres database and have to do it with java.
As I new to these concepts (solr,lucene), I am struggling with this. I already installed and configured apache Solr with glassfish.Now I dont know how to start with this, Whether I have to cretae a java project in eclipse or I have to use Solr admin GUI.
can any one help me on this?
Thanks in Advance.....
In order to make data searchable, you have to first index your data. You can use one of the following ways to index data.
By using Solr clients such as Solrj
If you store your data in relational DB then you can use DataImportHandler
By posting XML or Json messages. Check here for documentation.
When new data added you can index it using Solr clients (Solrj). You can also search your data using Solrj or any other client libraries.
You can find other client libraries here.
You can start with Solr DIH to index the data from postgres to Solr.
For more detailed understanding you can refer to :-
how-to-import-data-from-sql-databases-part-1
how-to-import-data-from-sql-databases-part-2
how-to-import-data-from-sql-databases-part-3

Key Points/Challenges while working with Apache Tika and Solr

Recently I got involved in a task, and part of it require to use Apache Solr ( for Document Search) ,and Apache Tika ( to Extract the meta-text or plain text from documents)
I have n't integrated Solr and tika yet ,But I have worked with both of them individually I might have set of questions related to Apache Solr and Apache Tika , It might be at beginners level or average.
Following types of practical I did with Solr e.g. created a dummy database, wrote a program, configured - schema.xml things, ran Solr sever, and program which fetches documents from database and store in Solr Document Index , Made a Simple client to fetch data from Solr via JSON Interface, Made a Program which keeps MySQL Database to sync with Apache’s Solr document Index.
Following types of practical I did with tika e.g. compiled and Installed Tika, understood its document parsing capablities.
..
My Sample Task statement:
Part of my project require to store around 100,000 of documents (Data of these 100,000 (Doc,PDF,Txt) docs are fetched by Apache tika and pushed to MySql’s Database and later that pushed to apache Solr’s Document Database)for Full Text Search and search them those via a client interface (Browser)
In simple programmatical level this task will get done,
I would like to understand the challenges related to managing the index or something else in Solr e.g.
** In advanced level does it require optimizing the Solr’s Open Source Code?
** While Solr works in proper way, does it provide any specific challenges?
** What Key things need to consider initially so that, Solr should work in a proper way.
** Do you think any extra tool to developed to monitor Solr’s working ?
Hope you got the idea related to questions I have ?
** Also I would like to know If you have any experience of using apache Tika with apache Solr, and any challenges or key things to consider ?
Would you like to recommend and specific sources Or If you have any document or anything which you feel to be helpful.

Resources