How to implement Solr into Sitecore

How to implement Solr into Sitecore - solr

I have to implement Solr index into Sitecore and I would like to know what is the best approach?
I looked at following approaches:
Capture publish end event (or other events) and then push item to solr index
Implement custom database crawler and get all changes from history table. Then using custom index push data to solr.
Second approach sounds like a way to go (in my opinion). In this case do I need to create a new search index, or search manager?
If anyone's done it before, can you point me into the right direction? Also if you could post some links to articles about sitecore-solr implementation.
UPDATE
Ok, after reading sitecore documentation this is what I came up with :
Create your custom SolrConfiguration class where you can set properties like solrserviceurl, add indexes and its definition (custom solr indexes)
Create SolrIndex and add it (in the config file) to your SolrConfiguration. Which instantiating, solrindex should subscribe to AddEntry event of Sitecore History Manager, and communicate with solr crawlers.
Create custom processor and hook into sitecore initialisation pipeline. Processor should initialize SolrConfiguration (from step 1)
Since everything in your config file in will be build using refrection, you can get instance of your cofiguration based on your config file
How does that sound like. Can I have any comments please?

We've done this on a few sites and tend to have a new "published" solr index and "unpublished" index
We interrupt:
OnItemSaving
Event to push things into the unpublished index (you may not need this, it depends if you want things in preview mode)
OnPublishItemProcessed
We process additions and updates to the published index here, I'm not sure what we do about deletions here without digging right into the code but certainly deal with deletions on the OnItemDelete (mentioned below)
OnItemDelete
We interrupt here to remove things from the published and non-published index (I think we remove from the published index here because Sitecore makes you publish the parent node in order to publish out deletions to the web database)
I hope that helps, I'd post the code if I could (but I'd be scowled at).

In addition to the already posted answer (which I think is a good way to do things) I'll share how we do it.
We basically just took a look at the Sitecore database crawler and decided to do things kind of like how it was doing it.
We utilize a significantly modified version of the Custom Item Generator to facilitate mapping between strongly typed objects and an object that has properties that correspond to our Solr schema. For actual communication with Solr we use SolrNet.
The general idea is that we loop through all the items (starting with the site root) recursively and map them to the appropriate type based on its template. Then we go through an indexing process for that item (some items need to index multiple documents to Solr in our implementation).
This approach is working very well for us except I will note that because we are indexing everything at once, it tends to introduce a slight bit of lag time between publish and the site reflecting any changes made to the index. One oversight we made in the beginning but will be working to fix soon is that we don't have an "unpublished" index (meaning we need to publish the site to see updates). It doesn't impact our solution that much really, but I can definitely see where it would others, so keep that in mind.
We didn't particularly want to get into the deletion of items from the index so we do the indexing as a publish:end event.
I hope this additional insight helps you. As far as I know there's not a whole lot of information out there about this specific combination of products, but I can tell you it's definitely possible and quite useful.

Related

Get information from publishItem to publish end in sitecore

I got a setup with sitecore and solr.
Im looking to gather information (the different TemplatesIds) in publishItem, and then when the publish has ended, call solr with the names which needs to be reindex.
Ive managed to get all the template IDs both using PublishItemProcessor and as a publish:itemProcessed event, where i store the template ids in the PublishContext.CustomData as a Hashset.
But how can i, when the publishing is done get this information i've gathered during publishing? I want to call solr, once, and only once, after everything is published, with information gathered during the publishing.
Hope this makes sense guys, please help out.

You don't need to make a hack to reindex indexes after a publishing.
Sitecore has out of the box this functionality.
You use index update strategies to maintain indexes. You can configure each index with a unique set of index update strategies. You should not specify more than three update strategies per index for performance reasons.
Sitecore provides a varied set of index update strategies, and you can extend this set with more strategies.
All the strategies that are delivered with Sitecore are defined under the following node in the Sitecore.ContentSearch.Solr.Index.IndexName configuration files:
<configuration ref="contentSearch/indexConfigurations/defaultSolrIndexConfiguration" />
<strategies hint="list:AddStrategy">
You need to use of these default strategies:
RebuildAfterFullPublish
OnPublishEndAsync
More information about search, indexing and crawling you can find here:
https://doc.sitecore.net/sitecore_experience_platform/setting_up__maintaining/search_and_indexing

How is ElasticSearch supposed to work in CakePHP 3?

I've been trying my very best not to ask any nosy question here in stackoverflow, but it has been almost one week since I got stuck in this problem and I couldn't find any solution.
I already have my working website built with CakePHP 3.2. What the website basically does is scrape Twitter for tweets containing a given search term, check if it's already in my database, and store it if it doesn't yet exist. Twitter's JSON response has this "tweet_id" property, and I've been using that value to check for whether I should ignore or append a specific tweet to my DB. While this might be okay while my database is small, I suspect it's going to slow things down considerably when my tables grow bigger. Thus my need for ElasticSearch.
My ElasticSearch server is running on my Arch Linux install, and I've configured my app to point to the said server. Also, I have my "Type" object named the same way as my "Tweets" table (I followed the documentation until the overview part http://book.cakephp.org/3.0/en/elasticsearch.html). This craps out an "Unknown method "alias" error, and following Google searches led me to creating an alternate pagination class since that was what some found to be the cause of the error (https://github.com/lorenzo/audit-stash/issues/4), which still doesn't fix things.
I'm not sure if I got this right. I installed the ElasticSearch plugin with the assumption that all I have to do is name the Types the same name as my tables, since to me the documentation "implies" that this should be done on top of the Blog Tutorial they did to "improve query performance".
TLDR, how is this supposed to work? Is my above assumption right? Do I name the Types differently and index everything myself? I'm not sure if there's just too much automagic, or I'm just poor at these sort of things. And yes, I'm new to frameworks (but not PHP, among other languages)
Thanks in advance!

Solr Collection Creation and Custom Collection Libraries

I've run into a situation where we sometimes have to completely wipe an index and then re-index a collection. This process of course takes a lot of time. I don't want to allow for any or at least extended downtime in Prod. Thus, I am researching into a way in Solr to create a new Collection that is a copy of an old collection, but without the data. I can re-index this new Collection with little or any service degradation. I then want to use aliases to point the new Collection to the alias that our clients are using so that they will start using the new Collection without even knowing it.
I'm currently running 4.2, but wondering if I shouldn't upgrade to 4.7 in order to support this better. Seems like 4.2 has most of the same Collection API support.
One of the first snags I'm hitting is that the Collection I am copying has a lib folder in it with customer libraries. If possible I want to push these out to the solrhome/lib folder so that they only get loaded once. My problem with this is that if I have different versions of say a custom data importer, then I will run into classloader issues.
Has anyone successfully implemented this kind of scenario and could provide some insight into the pitfalls and successes that you had and what worked for you?
More Details...
I have many different collections that are part of this Solr cloud. I don't want to effect any of the other collections if possible while making changes to the new copied collection.

I'm also having a similar kind of situation where I might to modify solr schema and need to re index the whole data. But I don't have much downtime in production. So, we came up with a solution like..
Let's say I have a SolrCloud1 (existing one), with collection1 (It has it's own structure). I have my application running in different machine. There is a load-balancer in between my SolrCloud1 and application.
Now, create a separate SolrCloud (say, SolrCloud2) with collection1. Maintain the same structure as it was previously. Now, do the re indexing part in this SolrCloud2. When it's done, make available the new SolrCloud under the load-balancer. When the new SolrCLoud2 is up, shut down the SolrCloud1.
Thus without any production down time you'll re index the data. Users won't be able to know anything about this. Hope this will help.

Easy GUI way to add data to a Solr index

I'm working on the front end of an app that uses Solr for data storage. Currently I have an empty index, but it'd (understandably) be a lot easier for me if some dummy data was returned so I could make sure that it's output correctly on the front end.
If I was working with and RDBMS (let's say postgres) I'd open up a GUI (e.g. pgadmin) and type data manually into a few rows to achieve this goal. I have access to the Solr web interface, but I can't see any obvious call to action saying INSERT YOUR DATA HERE. The closest thing I can find to an answer on the web is this SO thread, but it's still not quite the droids easy GUI-based solution I'm looking for.
So, my question is: Is the a way to quickly and easily insert some data equivalent to the RDBMS method mentioned above?

Make sure you have defined a schema in schema.xml.
SOLR does indeed have a (limited) html GUI, which on a local installation is probably found at localhost:8983/solr (default). If you can get to the base admin page, then on the left there is a small combobox where you can select a core/collection. If you click on THAT, then you get a list of options that emerges, and you can pick 'documents' to get a similar GUI to what I think you expect from postgres/RDBMS/whatnot.
http://localhost:8983/solr/#/collection1/documents is the URL on a default SOLR installation that I have. This should work as long as you don't have default cores. (Replace collection1 with your collection name and localhost:8983 with wherever your solr is hosted/the port).

Cakephp System-wide Search Function

I am looking to implement a search function into my system that allows system-wide search that can search through every single model. Is that possible?
I have tried CakeDC Search plugin but it somehow only allows me to search in the particular model that I add the search function into. Also, it seems to be limited to the search fields that I add in the view and I have to keep adding those search fields to enable search for them. What I am looking for is something with just one search box and able to retrieve information from all over the system (eg. Google's basic search with just one search field).
would be great that someone can point me in the right direction or even provide instructions on how to do so as I am fairly new with Cake.
I am using PHPMyAdmin for the database and the latest version of CakePHP. Please do let me know if you need further information as I am not sure what I need to include here.
Thank you.

http://cakedc.com/downloads/view/cakephp_search_plugin
The plugin allows you to attach the behavior to any model you want, and specifiy the table fields that it should take data from to create a search index. (basically when ever you save some data against the model, it runs through all the fields you specify it to look through, creates a data array to save, and saves the search data in its own table, ready to be searched on).
I would suggest giving this a go as it sounds like what you are after. The documentation provies detailed instructions on setting things up.
Hope it helps
Pete

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight