REST API for SOLR analyzer - solr

I'm going to test my SOLR analyzer and I've found instructions how to do it here: https://cwiki.apache.org/confluence/display/solr/Running+Your+Analyzer.
But I need to check several thousand of words, so I'm going to do it programmatically, not manually. Does SOLR have any REST API to run analyzer?
Thank you!

The Solr Admin page is just a set of static HTML files that uses the REST API offered by Solr behind the scenes. If you watch the Network tab in your browser's developer tools while navigating it, you'll see all the endpoints it talks to.
After doing this on the Analysis page, you can see that it makes requests to three endpoints, one to fetch the HTML, then two new requests to get the schema (for the field list) and one to perform the actual analysis:
http://localhost:8983/solr/corename/analysis/field?wt=json&analysis.showmatch=true&analysis.fieldvalue=asd&analysis.query=asd&analysis.fieldname=content

Related

Access Sitecore DB from API in Console application

I would like to accesss the sitecore DB and items from console application like
Sitecore.Data.Database db = Sitecore.Context.Database
or
Sitecore.Data.Database db = Sitecore.Data.Database.GetDatabase("master")
how do I configure and setup my console application to access the DB as above?
Thanks Everyone for the suggestion, I am really interested in config changes, I used webservice, but it has very limited methods. For example, if I would like create an Item with the template and insert the item with prepopulated value, there is no such option. The reason I am looking for the console apporach is I would like to import the contents from XML or excel sheet and push those to the sitecore tree, eventually use the scheduled task to run the console app periodically. I do not want to copy the entire web.config and app_config. If anyone has already done this, could you please post your steps and necessary config changes?
You have two options I think:
1) Import the Sitecore bits of a website's web.config into your console application's app.config, so that the Sitecore API "just works"
I'm sure I read a blog post about this, but I can't find the reference right now. (I will have another look) But I think the simple but long winded approach is to copy all of the <sitecore/> element and all the separate files it references. I'm fairly sure you can whittle this down to a subset of the config required for data access with a bit of thinking.
2) Don't use the Sitecore API directly, connect to a web service that exposes access to it remotely.
There are a few of these that already exist. Sitecore itself exposes one, Sitecore Rocks has one, and Hedgehog TDS has one too. And you can always write your own (since any web service running inside the Sitecore ASP.Net app can make database calls and report values back and forth - just remember to consider security if this web service might end up exposed externally for any reason)
John West links to some relevant stuff here:
http://www.sitecore.net/Learn/Blogs/Technical-Blogs/John-West-Sitecore-Blog/Posts/2013/09/Getting-Data-Out-of-the-Sitecore-ASPNET-CMS.aspx
-- Edited to add --
I've not found the blog post I remember. But I came across this SO thread:
Accessing Sitecore API from a CLI tool
which refers to this blog post:
http://www.experimentsincode.com/?p=232
which I think gives the info you'll need for option 1.
(And it reminds me that, of course, when you copy the config stuff you have to copy the Sitecore binaries into your app's folder as well)
I would just like to expand on #JermDavis' post and note that Sitecore isn't a big fan of being accessed when not in a web application. However, if you still want to do this, you will need to make sure that you have all of the necessary configuration settings from the web.config and App_Config of your site in your console application's app.config file.
Moreover, you will never be able to call Sitecore.Context in a console application, as the Sitecore Context sits on top of the HttpContext which means that it must be an application and have a valid request for you to use it. What you are looking for is something more along the lines of Sitecore.Configuration.Factory.GetDatabase("master").
Good luck and happy coding :)
This sounds like a job for the Sitecore Item Web API. I use the Sitecore Item Web API whenever I need to access Sitecore data from the master database outside the context of the Content Management server or outside of the context of the Sitecore application. The Web API definitely does not allow you to do everything that the standard Sitecore API does but it can act as a good base and I now extend upon the Web API instead of writing my own custom web services whenever possible.
Thanks to JemDavis's advise.
After I copied the configuration and made changes to config section to get rid of conflicts. I copied almost all of Sitrecore, analytics and lucene dlls, it worked great.
Only thing you have to remember is, copy the app_config folder to the same location where your dlls are.
Thanks again JemDavis....

How can I tell if I'm including google analytics twice?

I have a web app and I include google analytics. My active users seems to of spiked and I'm incredibly paranoid that I'm somehow double counting my analytics.
Is there any way to see if I'm doing this?
As Nebojsa mentioned, you can inspect source and search for ga.js or analytics.js to see if it's in your application twice.
Look through your source code to see if you have the partial rendering in multiple places (ex. header and footer)
Setup another Google Analytics account and test locally if its double counting your visits. See this post for setting up GA on localhost
Use the Google Analytics Tag Assistant to verify that everything is setup correctly. It will tell you if there are any implementation problems, including multiple tracking codes. It also helps with Adwords, re-marketing and other Google product scripts.
Use the Google Analytics Debugger. This would probably be the most helpful to determine if a single hit is being double counted as it walks you though every single function call the analytics urchin makes.
just open source in the browser and look-up for code of analitics...par example
_gaq.push(['_setAccount', ...

why i couldn't see any text in "http://crawlservice.appspot.com/?key=123456&url=http://mydomain.com#!article"?

Ok, i found this link https://code.google.com/p/gwt-platform/wiki/CrawlerSupport#Using_gwtp-crawler-service that explain how you can make your GWTP app crawlable.
I got some GWTP experience, but i know nothing about AppEngine.
Google said its "crawlservice.appspot.com" can parse any Ajax page. Now I have a page "http://mydomain.com#!article" that has an artice that was pulled from Database. Say that page has the text "this is my article". Now I open this link:
crawlservice.appspot.com/?key=123456&url=http://mydomain.com#!article, then i can see all javascript but I couldn't find the text "this is my article".
Why?
Now let check with a real life example
open this link https://groups.google.com/forum/#!topic/google-web-toolkit/Syi04ArKl4k & you will see the text "If i open that url in IE"
Now you open http://crawlservice.appspot.com/?key=123456&url=https://groups.google.com/forum/#!topic/google-web-toolkit/Syi04ArKl4k you can see all javascript but there is no text "If i open that url in IE",
Why is it?
SO if i use http://crawlservice.appspot.com/?key=123456&url=mydomain#!article then Can google crawler be able to see the text in mydomain#!article?
also why the key=123456, it means everyone can use this service? do we have our own key? does google limit the number of calls to their service?
Could you explain all these things?
Extra Info:
Christopher suggested me to use this example
https://github.com/ArcBees/GWTP-Samples/tree/master/gwtp-samples/gwtp-sample-crawler-service
However, I ran into other problem. My app is a pure GWTP, it doesn't have appengine-web.xml in WEB-INF. I have no idea what is appengine or GAE mean or what is Maven.
DO i need to register AppEngine?
My Appp may have a lot of traffic. Also I am using Godaddy VPS. I don't want to register App Engine since I have to pay for Google for extra traffic.
Everything in my GWTP App is ok right now except Crawler Function.
So if I don't use Google App Engine, then how can i build Crawler Function for GWTP?
I tried to use HTMLUnit for my app, but HTMLUnit doesn't work for GWTP (See details in here Why HTMLUnit always shows the HostPage no matter what url I type in (Crawlable GWT APP)? )
I believe you are not allowed to crawl Google Groups. Probably they are actively trying to prevent this, so you do not see the expected content.
There's a couple points I wish to elaborate on:
The Google Code documentation is no longer maintained. You should look on Github instead: https://github.com/ArcBees/GWTP/wiki/Crawler-Support
You shouldn't use http://crawlservice.appspot.com. This isn't a Google service, it's out of date and we may decide to delete it down the road. This only serves as a public example. You should create your own application on App Engine (https://appengine.google.com/)
There is a sample here (https://github.com/ArcBees/GWTP-Samples/tree/master/gwtp-samples/gwtp-sample-crawler-service) using GWTP's Crawler Service. You can basically copy-paste it. Just make sure you update the <application> tag in appengine-web.xml to the name of your application and use your own service key in CrawlerModule.
Finally, if your client uses GWTP and you followed the documentation, it will work. If you want to try it manually, you must encode the Query Parameters.
For example http://crawlservice.appspot.com/?key=123456&url=http://www.arcbees.com#!service will not work because the hash (everything including and after #) is not sent to the server.
On the other hand http://crawlservice.appspot.com/?key=123456&url=http%3A%2F%2Fwww.arcbees.com%2F%23!service will work.

Is it possible using the GAE Logs API to retrive front-end logs only?

How can we we send a query to the Log API such that it only retrieves logs from the front end and not the backends?
I don't know what runtime you're asking about, but looking at the Python source for SDK 1.8.8 you have the following arguments for the google.appengine.api.logservice.fetch function:
module_versions: A list of tuples of the form (module, version), that
indicate that the logs for the given module/version combination should be
fetched. Duplicate tuples will be ignored. This kwarg may not be used
in conjunction with the 'version_ids' kwarg.
(This isn't yet reflected in the Google Developers site)
This does not mean you can directly access front-end logs, but if you convert your app from using backends to using two named modules, one for front-end requests and another for backend work, you'll be able to fetch the logs of each independently.

Search support for Google App Engine Go runtime

There is search support (experimental) for python and Java, and eventually Go also may supported. Till then, how can I do minimal search on my records?
Through the mailing list, I got an idea about proxying the search request to a python backend. I am still evaluating GAE, and not used backends yet. To setup the search with a python backed, do I have to send all the request (from Go) to data store through this backend? How practical is it, and disadvantages? Any tutorial on this.
thanks.
You could make a RESTful Python app that with a few handlers and your Go app would make urlfetches to the Python app. Then you can run the Python app as either a backend or a frontend (with a different version than your Go app). The first handler would receive a key as input, would fetch that entity from the datastore, and then would store the relevant info in the search index. The second handler would receive a query, do a search against the index, and return the results. You would need a handler for removing documents from the search index and any other operations you want.
Instead of the first handler receiving a key and fetching from the datastore you could also just send it the entity data in the fetch.
You could also use a service like IndexDen for now (especially if you don't have many entities to index):
http://indexden.com/
When making urlfetches keep in mind the quotas currently apply even when requesting URLs from your own app. There are two issues in the tracker requesting to have these quotas removed/increased when communicating with your own apps but there is no guarantee that will happen. See here:
http://code.google.com/p/googleappengine/issues/detail?id=8051
http://code.google.com/p/googleappengine/issues/detail?id=8052
There is full text search coming for the Go runtime very very very soon.

Resources