Cassandra and solr on same node - solr

I am working on architecting a POC Cassandra Datastax enterprise cluster environment. We are going to use solr in combination with Cassandra. Would it be a valid configuration to host both solr and Cassandra on the same physical server?

If you're evaluating DSE, Solr is built into the packages you're using. It's an extremely tight integration that would be tough to replicate on your own. Here's the documentation: https://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/srch/srchIntro.html
It's also worth noting that Solr, in this case, does run co-located with Cassandra for data locality, and to take advantage of C* replication, availability, and some other C* specific benefits.
But most importantly I suggest checking out this hands on training: https://academy.datastax.com/courses/ds310-datastax-enterprise-search-apache-solr
If you have any specific questions about the integration, update your question and I'd be happy to help.

Related

How to test code that uses DSE Cassandra/Solr?

I am working on an application where we are interfacing with Datastax Enterprise edition (we are auto syncing with solr).
I was wondering how can this application be efficiently tested.
I was considering embedded cassandra for testing but the caveat there is that we are using solr_query to query Cassandra.
The alternative is to setup a test keyspace in the real node and run the tests using that keyspace.
But I would like to write functional test cases that has no dependency on the real cassandra database.
I would like to know about the best practices that people follow to handle such scenarios.
Cheers,
Utsav
The DataStax java driver does this type of thing using CCM. CCM is a tool to stand up / simulate a small cluster (both DSE and OSS C* are supported) on a single machine.
Check out their code here https://github.com/datastax/java-driver/tree/3.x/testing

Is there any other way to achieve solr clustering other than solr-cloud

I am using solr 4.3 which is integrated with liferay6.2 and liferay has some integration problem when it comes to solr cloud. So how can I achieve clustering especially the fault tolerance feature without using solr-cloud. Is there any other alternative to solr cloud.
Solr's old way of scaling is still in place. It is now called "Legacy Scaling and Distribution", you will find it under exactly this topic in Solr's official documentation.
There it is quoted
This section describes how to set up distribution and replication in
Solr. It is considered "legacy" behavior, since while it is still
supported in Solr, the SolrCloud functionality described in the
previous chapter is where the current development is headed. However,
if you don't need all that SolrCloud delivers, search distribution and
index replication may be sufficient.
This section covers the following topics:
Introduction to Scaling and Distribution: Conceptual information about
distribution and replication in Solr.
Distributed Search with Index Sharding: Detailed information about
implementing distributed searching in Solr.
Index Replication: Detailed information about replicating your Solr
indexes.
Combining Distribution and Replication: Detailed information about
replicating shards in a distributed index.
Merging Indexes: Information about combining separate indexes in Solr.

Indefinite Search Cluster (Solr vs ES vs Datastax EE)

PREFACE:
This question is not asking for an open ended comparison of Elastic Search vs. Solr vs. Datastax Solr (Datastax EE). (Though links in comments section for this are welcome).
PROJECT:
I have been building a domain name type web service for a while. In doing so, I am realizing the exponential growth of such service.
BACKGROUND:
I would like to know which specific search platform allows me to save and expand indefinitely. Yes, I realize you can split a Solr Shard these days– so if I have a 20 shard solr cloud I can later split them into 40 (I think? Again... that's not indefinate). Not sure on the Elastic Search side of things. Datastax (EE) seems to be the answer because of Cassandra’s architecture but (A) Since they give no transparency on license price – and I have to disclose my earnings to them I'm quickly reminded of Oracle's bleed you slowly fee strategy and as I start-up that is a huge deterrent. Also, (B) When they say they integrate full MapReduce with Hive, Sqop, Mahout, Solr, and Pig – I’m thinking I don’t want to spend a lifetime learning bells and whistles that aren’t applicable to my project. I want a search platform that I can add 2 billion documents a month (or whatever number) indefinitely and not have to worry that I started a cluster with too little shards upfront.
QUESTION:
Admittedly my background section is pilfered with ignorance that I would like to correct. My intention is not to offend or dilute these amazing technologies. I am simply wondering which of them can scale w/o having to worry about overgrowing shards [I took out the word forever here -- thank you per comment below]. Or can any? Not hardware-wise, but Shards. Which platform can I use and not have to worry about the future growth whether its 20TB or 2PB. Assume hardware budget for servers, switches, etc. etc. are indefinite.
DataStax Enterprise (DSE) is not a "search platform" per se. One of the features DSE provides is the ability to search data stored in Cassandra. Cassandra is being used to store and access enterprise operational data. The idea is that once you have decided that Cassandra is your preferred data store for your enterprise operational data, the DSE/Solr integration then allows you to perform rich search on that data.
Large enterprises are looking to migrate off of traditional relational databases, to more modern platforms such as NoSQL databases, such as Cassandra, where scalability and distributed computing (including multi-data center support, tunable consistency, and robust operations tools, including the OpsCenter GUI dashboard) are the norm. The Solr integration of DSE facilitates that migration.
With regards to your revenue, that link points to a startup program. That makes the software 100% free if you qualify.

Alternative Solandra for cassandra

Anyone knows an alternative to Solandra in Cassandra?
I can't use "like" clause, and in my case i'll use always.
Thanks.
Datastax provides a "tweaked" version of Apache Solr (which saves data directly into Cassandra instead of flat files) to do real-time full-text search. It's called Datastax Enterprise Solution. Of course is not free.
As an alternative, you can couple Cassandra with an Elastic Search cluster but it's kind of heavy just for text search.
Last but not least, try to implement yourself a full text search using Lucene as engine and some hand-made Cassandra tables for storage, good luck though.
You have 3 options to bring advanced search capabilities with Cassandra:
Datastax Solr as already mentioned
Elassandra = ElasticSearch on Cassandra. https://github.com/strapdata/elassandra and http://www.strapdata.com/ It's a good product, we use it in my company. The community edition is free and the latest release combine Cassandra 3.11 with Elasticsearch 5.5. You will see on their website that there is some free trial hosted solution you could use to test.
Stratio lucene plugin
https://github.com/Stratio/cassandra-lucene-index It's free, it works, we also use it in my company. It's just a jar to drop in the Cassandra lib directory.
Of course, for very basic search needs, you can have a look at SASI too.

How to synchronize indexes and repository of Apache Lucene and Solr in Clustered JBoss

I have a situation, I want to run my demo Web-Application built with EJB-Hibernate into JBoss Cluter for High Availability and in my application we use Apache Solr (and one part uses Lucene as well) for text-based search.
I got the clustering information from Jboss official website, but I am not able to get any information about how to sync up solr or lucene indexes and their data repositories..?
I am sure that lot many people must have done clustering with Lucene or solr in them, please anyone point me to the correct source about it. About how to synchronize solr or lucene directories on multiple server instances of JBoss.
I have embedded solr deployment, so as Jayendra had suggested below, Solr Replication with HTTP is not possible for me. Is there any other way to do solr-replication with repeater configuration (i.e. my all nodes will act as both master as well as slave)?
If you want to copy/sync data repositories for Solr, you can check for Solr Replication which will allow you to sync data repositories across different solrs instances on different machines
The clustering technology of JBoss and WildFly is based on the Infinispan OSS project.
Infinispan provides an highly efficient distributed storage model, and the project includes an Apache Lucene index storage layer:
http://infinispan.org/docs/dev/user_guide/user_guide.html#integrations:lucene-directory
It should be easy to replace the Solr Directory with this implementation.

Resources