Analysis feature of SOLR web admin - solr

I am wondering if I can use "analysis" feature of SOLR web admin (4.1) in my script. That is to get analyzed result given a string. I am guessing there should be some API that is utilized by the SOLR web admin
Or I would like to find a way to run an analyzer on some strings.

The analysis Admin page is just leveraging the AnaylsisRequestHandler behind the scenes to display the results. Please see the link for more details and an example.

Related

how to communicate with solr while searching?

I am learning how to develop a search application using solr.I have a website created using html where it has a search bar.
when the user enters a keywords to be searched it has to retrieve the matched records from data indexed into solr. my question is how to connect frontend
website with solr.
Please give me clear steps to implement the same.
There are different library for communication with solr, you can use depends on your technologies. some are ::
Solarium [PHP] :
Solarium is a PHP Solr client library that accurately model Solr concepts. Where many other Solr libraries only handle the communication with Solr, Solarium also relieves you of handling all the complex Solr query parameters using a well documented API.
https://github.com/solariumphp/solarium
Haystack [Django] :
Haystack provides modular search for Django. It features a unified, familiar API that allows you to plug in different search backends (such as Solr, Elasticsearch, Whoosh, Xapian, etc.) without having to modify your code.
https://django-haystack.readthedocs.io/en/master/
If you are using JavaScript you could use the Sorl REST API directly from the client.
There are various client APIs:
https://lucene.apache.org/solr/guide/6_6/client-apis.html

How to automate solr indexing?

Normally we do indexing in solr from a browser. Can we do it automatically by writing a batch job or java code?
Please provide me some idea, if it is possible.
You can use the DataImportHandler, which can import from lot of different sources such as databases or xml files: https://wiki.apache.org/solr/DataImportHandler
If you have specific requirements which are not satisfied by the DataImportHandler you may implement your own indexer by using a solr client api:
https://cwiki.apache.org/confluence/display/solr/Client+APIs
If you want to do stuff with Solr programmaticaly take a look at: Solrj which is an API that'll do what your asking for.
You can use a web debugging proxy such as Fiddler to view the HTTP request that is generated when you trigger the data import via a web browser. Then send the same request from your Java code.

solr faceted search UI development

I am working on a POC, where I have to display faceted search result on web page. can anybody please help me to suggest what all set up I need to configure to display. I would prefer java technologies. Just to mention, I have solr cloud running on remote server.
I would like to know:
1. Should I use MVC framework?
2. How will my local interact with remote solr server?
3. How will I send query through java code and what technology I should use to display faceted search result?
or any example how someone is doing will be very helpful.
Please help me on this.
Thanks,
One of the quickest ways to create your POC is by using the VelocityResponseWriter, this response writter is bundled in Solr distribution, it's basically a series of Apache Velocity templates that are very easy to customize.

security threats in exposing solr query to customers in live website

I am going to build a website and I am planning to use solr for search integration.It is a ecommerce web site. I wanted to know if there is any problem in exposing the solr query format to the users of this website?
You want to have your search app query Solr, or use a proxy, so the URL is not exposed to the web user. I'm not so concerned about query syntax and parameters being visible, as long as the web user can't send them via query. You for certain want to make sure only the web app can reach the Solr server, however.
Even if you lock down the RequestHandlers so that only searches are available through the web, there still may be things in your index that you don't want to expose to customers.
For example, if two items score the same in a search, you'd like to boost the one with the higher margin. In order to do that you need to have the margin in your index, and that means it's available for all of your customers and competitors to see.
The JSON response writer is very handy for writing lightweight search apps. At the very least you'll want to implement a filtering proxy between the browser and Solr.
You absolutely do not want to expose Solr directly to users. Nor do you want to pass the format through without evaluation.
One of the thing that Solr supports is delete by query. There are other possibilities as well. You have to sanitize the content of queries.

Web Analytics & Stats

We want to add tracking statistics to a web application we are building but are pretty unsure of how to go about it. (i.e. clicks, pageviews, unique visits etc)
Does anyone have any articles on the best way to go about incorporating tracking data into an application ? i.e. javascript tracking or IIS etc ?
We want to add tracking in as a ASP.NET MVC module - but we are unsure as to the best way to actually get the data and essentially 'track' this information ?
If anyone could help out - much appreciated.
Edit: just to be clear, we want to do this in-house and present the stats to our users as an additional fee module?
You can turn on the logging for IIS and then use the SQL Server Report Server Pack for IIS. It comes with many canned reports for your sites stats and then you could take it from there with your own custom reports.
You could also just use log parser to get the stats into a SQL Server DB and then you could use SQL from their to analyse and roll your own app.
Either way, you could modularize this and sell it as an add-on to your customer base.
You could use Piwik, you just need PHP version 5.1.3 or greater and MySQL version 4.1 or greater. As they say in their website, "Piwik aims to be an open source alternative to Google Analytics."
They have a demo on the official website so you can see if it's what you're looking for.
Google analytics is a popular service. You just insert a bit of javascript on every page that contains your sites name and Google tracks the data and provides all the report on a handy web based dashboard.
It's not an ASP.net MVC module like what you mentioned, but it will certain track stats for you and will be a lot simpler to set up than trying to code or integrate anything yourselves.
I'd look at analytics to begin with and only branch out to something more complex if it doesn't meet your requirements.
klabranche provided a holistic answer in terms of using logs of web server. I think using web server log is a a great way to analyse data of your web application.
That being said, depend on your web application and the scope of your analytics, just relay on web server log is not a good way to.
As you may know, web log does not record users behaviors like clicking certain tabs which may not trigger a web server request. Obviously your web log has no idea whether users clicked that tab or not, this may hurt your analyse.
Another you need to know is browser cache, this may create another black hole in your data.
RECAP
If you want to do a holistic analytics, you need to use two approaches, one is JavaScrip tag, another one is web log. Since both of them have shortages, combining them together will give you a complete picture.
Hope this helps

Resources