security threats in exposing solr query to customers in live website - solr

I am going to build a website and I am planning to use solr for search integration.It is a ecommerce web site. I wanted to know if there is any problem in exposing the solr query format to the users of this website?

You want to have your search app query Solr, or use a proxy, so the URL is not exposed to the web user. I'm not so concerned about query syntax and parameters being visible, as long as the web user can't send them via query. You for certain want to make sure only the web app can reach the Solr server, however.

Even if you lock down the RequestHandlers so that only searches are available through the web, there still may be things in your index that you don't want to expose to customers.
For example, if two items score the same in a search, you'd like to boost the one with the higher margin. In order to do that you need to have the margin in your index, and that means it's available for all of your customers and competitors to see.
The JSON response writer is very handy for writing lightweight search apps. At the very least you'll want to implement a filtering proxy between the browser and Solr.

You absolutely do not want to expose Solr directly to users. Nor do you want to pass the format through without evaluation.
One of the thing that Solr supports is delete by query. There are other possibilities as well. You have to sanitize the content of queries.

Related

how to communicate with solr while searching?

I am learning how to develop a search application using solr.I have a website created using html where it has a search bar.
when the user enters a keywords to be searched it has to retrieve the matched records from data indexed into solr. my question is how to connect frontend
website with solr.
Please give me clear steps to implement the same.
There are different library for communication with solr, you can use depends on your technologies. some are ::
Solarium [PHP] :
Solarium is a PHP Solr client library that accurately model Solr concepts. Where many other Solr libraries only handle the communication with Solr, Solarium also relieves you of handling all the complex Solr query parameters using a well documented API.
https://github.com/solariumphp/solarium
Haystack [Django] :
Haystack provides modular search for Django. It features a unified, familiar API that allows you to plug in different search backends (such as Solr, Elasticsearch, Whoosh, Xapian, etc.) without having to modify your code.
https://django-haystack.readthedocs.io/en/master/
If you are using JavaScript you could use the Sorl REST API directly from the client.
There are various client APIs:
https://lucene.apache.org/solr/guide/6_6/client-apis.html

Validating Solr queries against a schema

I would like to validate queries against a schema before actually executing them.
Is there an official API which will give me access to a schema, or will I have to parse the Solr configuration XML myself?
The usual trick for finding these resources it to open the Admin interface with the developer network tool running in your browser, then navigating to the resource you're looking for while watching which requests your browser perform. Since the frontend is purely Javascript based and runs in your browser, it accesses everything through the API exposed by Solr.
You'll have to parse something, either in JSON or XML (probably) format. For my older, 4.10.2-installation, it is available as:
/solr/corename/admin/file?file=schema.xml&contentType=text/xml;charset=utf-8

Analysis feature of SOLR web admin

I am wondering if I can use "analysis" feature of SOLR web admin (4.1) in my script. That is to get analyzed result given a string. I am guessing there should be some API that is utilized by the SOLR web admin
Or I would like to find a way to run an analyzer on some strings.
The analysis Admin page is just leveraging the AnaylsisRequestHandler behind the scenes to display the results. Please see the link for more details and an example.

Can Google be used for site search on a database backed website?

I'm developing a web site with Google App Engine, and I want to have a search feature for user submitted stuff. Since this project is just a toy and I don't control the server, I'd like to just use Google to handle search. However, since the content is stored in the database, I don't think Google can discover the dynamic urls. Unless maybe I create a page that links the last N submissions and hope it gets crawled frequently. Thoughts?
Absolutely. As long as the database is exposed in a web page which can be crawled, Google will crawl it (unless told not to).
The best way to make it all accessible is decent navigation between pages. However, lacking that, a site map page linked from the home page should suffice.
This is an excellent candidate for a sitemap.
You can generate the XML any way you want, and give it to Google. The best part is, it is a "private" XML file; no need to have ugly listings of dynamic URLs for users to see.

Web Analytics & Stats

We want to add tracking statistics to a web application we are building but are pretty unsure of how to go about it. (i.e. clicks, pageviews, unique visits etc)
Does anyone have any articles on the best way to go about incorporating tracking data into an application ? i.e. javascript tracking or IIS etc ?
We want to add tracking in as a ASP.NET MVC module - but we are unsure as to the best way to actually get the data and essentially 'track' this information ?
If anyone could help out - much appreciated.
Edit: just to be clear, we want to do this in-house and present the stats to our users as an additional fee module?
You can turn on the logging for IIS and then use the SQL Server Report Server Pack for IIS. It comes with many canned reports for your sites stats and then you could take it from there with your own custom reports.
You could also just use log parser to get the stats into a SQL Server DB and then you could use SQL from their to analyse and roll your own app.
Either way, you could modularize this and sell it as an add-on to your customer base.
You could use Piwik, you just need PHP version 5.1.3 or greater and MySQL version 4.1 or greater. As they say in their website, "Piwik aims to be an open source alternative to Google Analytics."
They have a demo on the official website so you can see if it's what you're looking for.
Google analytics is a popular service. You just insert a bit of javascript on every page that contains your sites name and Google tracks the data and provides all the report on a handy web based dashboard.
It's not an ASP.net MVC module like what you mentioned, but it will certain track stats for you and will be a lot simpler to set up than trying to code or integrate anything yourselves.
I'd look at analytics to begin with and only branch out to something more complex if it doesn't meet your requirements.
klabranche provided a holistic answer in terms of using logs of web server. I think using web server log is a a great way to analyse data of your web application.
That being said, depend on your web application and the scope of your analytics, just relay on web server log is not a good way to.
As you may know, web log does not record users behaviors like clicking certain tabs which may not trigger a web server request. Obviously your web log has no idea whether users clicked that tab or not, this may hurt your analyse.
Another you need to know is browser cache, this may create another black hole in your data.
RECAP
If you want to do a holistic analytics, you need to use two approaches, one is JavaScrip tag, another one is web log. Since both of them have shortages, combining them together will give you a complete picture.
Hope this helps

Resources