We need to integrate a search engine in our plataform Catalog management software in Share point. The information is stored in multiple databases and a storage of files ( doc , ppt , pdf .....). Our dev platform is Asp.Net and we have done some pre-liminary work on Lucene, found it to be good. However, we just came to know of Solr.
We need to continue using lucene, but we need to defend her the solr.
Please any help is accepted.
And sorry for my english.
Lucene is a full-text search library used to provide search functionalities to an application. It can't be used as an application by itself. Solr is a complete search engine built around Lucene providing its search functionalities and others. Solr is a web application that can be used by itself without any development around it.
If you need a search engine to be called by your application I recommend you to use Solr.
Related
Does Google Cloud Platform have a product to do full-text search via an API with non-web data (such as json or xml documents)? This may seem like a pretty silly question, but the only options I have come across are:
Search inside of Google App Engine (only available for python2, not python3) -- https://cloud.google.com/appengine/training/fts_intro/.
Related to web search only: https://developers.google.com/custom-search/docs/tutorial/introduction
Using a managed Elasticsearch: https://console.cloud.google.com/marketplace/details/google/elasticsearch.
Cloud firestore explicitly states it doesn't offer that and suggests using Aloglia (and gives details on integrating): https://cloud.google.com/firestore/docs/solutions/search
Is there something I'm missing? I'm basically looking to index and search about a million documents in a sort of free-form type of search. Is this offered as a product from Google outside of App Engine? If so, how can I access it?
You have pretty much covered it there. There is currently no specific Google service for full-text search. As you mentioned, App Engine Search API is available for Python 2.7, which will stop being maintained after January 2020, and not Python 3.
There is one more option you could consider, which is using Lucene foe GAE. I found this blog where several possibilities are studied, perhaps could be an interesting reading for you.
To sum up, I would recommend ElasticSearch or Aloglia, but for the latter you need a Firebase project.
After reading many Solr books and article all over on the net, now I have an idea of the power of this server.
But... how to integrate it in a real application? For example: a web site written in PHP, etc.
Right now, I understand that Solr produces XML, JSON etc results... so to integrate this in a web application, the "simple" work is to convert this information for render in a page or there are other technique to avoid this?
I'm my case, I have to develop a search engine to scan many documents and find result.
My idea was:
Use Solr to build an index and search documents
Use a web application to show the result
Looking on the net I haven't find anything that explains how to integrate Solr in a real application, all the reading are about "How to use Solr... with Solr..." Anything about a real integration.
Does someone have some useful resource how to integrate Solr in a real application, with some clean examples?
Edit: It looks like Apache maintains their own list of recommended
client APIs, and their recommended tool for PHP is Google's
library (though they refer to it as SolPHP). Given this, I imagine that this is the best place
to start.
A Solr library for the programming language you're using could save you some of the trouble in implementing the integration. For instance, if your site is written in PHP, you could try Google's Solr library for PHP.
I have done most of my Solr work in Java, so I have used SolrJ quite a bit. This is a well supported tool because it comes from Apache in parallel with the Solr product itself.
If you are doing work in any other languages, you are likely to find libraries available for them. The amount of time they save you may vary according to the quality of the library itself.
When I was using Solr in my project, only my application server (that is Tomcat) was communicating with Solr server. I wrote a class, which executes GET requests to Solr server based on input provided by end user. When Solr returns XML/JSON back to an application server you may parse it and process as every other bussiness data (render an *.html). So, summing up, Web Browser never communicates directly with Solr, all goes through an application server:
WebBrowser -> GET to application server -> GET to Solr server
show *.html <- parse XML/JSON, render *.html <- return XML/JSON
How to integrate Kibana with Apache solr instead of using elastic search.
If it cannot be done.
What are the alternatives to Kibana for Solr
At LucidWorks, we have ported Kibana to work with Solr and released it as open source.
If you want a bundled package, you can download that at http://www.lucidworks.com/lucidworks-silk/.
Our port for Kibana for Solr is bundled with Solr 4.7.0 and can be used as a query engine to build dashboards from indexes within the bundled Solr instance and/or located on other Solr instances.
The source code is available at https://github.com/LucidWorks/banana.
We have also included Solr Output Writer for LogStash with that bundle; however, you can use any ETL and indexing mechanism to get time series data into Solr. Links to this github repository are available on the LucidWorks link above.
HUE is an alternative search UI for Solr, while it is not good as Kibana for search at the moment.
You can use SiLK for sure but you are better off using the fully integrated dashboards module that comes with Lucidworks Fusion. Fusion will save you a ton of time and make it easier to focus on the search stuff that matters - like building a recommender engine, creating data-driven user experience, driving data enrichment with entity recognition and integrating with Big Data software like Hadoop.
I am tring to implement a website search engine with java as an applet,I have used nutch as web crawler and cassandra as my database,I have to use a nosql database(because my teacher wants me to do),now my question is what should I do next to complete my search engine?
I have googled a lot,but all of the sites are mostly about nutch and solr,and they build search engines with integration of these two,cause solr itself is somehow a database,I don't know what should I do,do I have to use solr too to complete my search engine?is it wise to use two databases(solr and cassandra)?or I should do some thing else?
please remember I have to use cassandra.
and please first explain me if I have understood things in a wrong way and then give me a minus mark,:D
I will be really really thankfull for your help,I have got somehow confused.
by the way does solr counted as a nosql database?excuse me,I am new to them all.
Check out Solr's Data Import Handler and see if you feel it would work. It allows you to query your database and store the results with Solr to which then Solr can manipulate the reuslts. Nutch also has very good integration with Solr should you choose to use it.
I would like to implement a search engine which should crawl a set of web sites, extract specific information from the pages and create full-text index of that specific information.
It seems to me that Xapian could be a good choice for the search engine library.
What are the options for a crawler/parser to integrate with Xapian?
Would Solr be a better choice than Xapian to integrate with open source crawlers/parsers?
Here's a little comparison between Xapian and Solr.
But if you want to build a crawler, take a look at Nutch. It's extensible with plugins, so you could write a plugin that analyzes the information that you're looking for.
Flax may provide some of what you're looking for.