Solr Config API: json overlay affecting managed-schema?

Solr Config API: json overlay affecting managed-schema? - solr

I'm using Solr 6.1.0, in a local environment. When using the config API to change the behaviour of solr.extraction.ExtractingRequestHandler, this somehow affects other fields in the index (and adds extra fields to managed-schema.xml).
This affects a few fields, always in the same way: content_type disappears from the query result (still in the schema though!) and instead there is Content-Type (which is added to managed-schema). My <solr_url>/config/overlay looks like this:
{
"responseHeader":{
"status":0,
"QTime":0},
"overlay":{
"znodeVersion":0,
"requestHandler":{"/update/extract":{
"name":"/update/extract",
"class":"solr.extraction.ExtractingRequestHandler",
"defaults":{
"fmap.content":"content",
"wt":"json",
"indent":true},
"useParams":"fmap.content"}}}}
The indexing works fine (and is using content_type, as expected) when this overlay is not there. I'm sure I made a mistake somewhere, but I have no idea where (and why).

You have useParams=fmap.content. That's a reference to a set of additional configuration parameters. For some reason it is using a name as one of the possible parameters, which might be confusing things.
So, this may mean you have a params.json file that has a section fmap.content and some things defined there. Including ones that change defaults set otherwise.
Specifically, by default you somehow have a parameter called lowernames set to true and your override disables it.

Related

How to query solr field for a substring

My use case:
I have a single-valued field called cqpath. This is a textfield and has a values that look something like the following:
"/content/domain/en/path/to/some/page"
"/content/domain/en/path/to/another/page"
"/content/domain/en-us/path/to/some/page"
"/content/domain/en-us/path/to/another/page"
I wanted to form a query that would return me 1. and 2. I'd been trying along the lines of writing:
cqpath: "/content/domain/en"
which has been discovered to be erroneous, since it retrieves items 3. and 4. as well. Could any of you think of a way to write a query that returns only 1. and 2. and not 3. and 4.?
This is a normal textfield field-type. Really do appreciate your help.

Starting from Solr 4.0 you can use a regex query. You can find some useful examples here.
In your case, you can get the results that you're looking for using something like:
cqpath:/.*content/domain/en.*/

It looks like you are trying to match partial paths here with boundaries on path elements (slashes). The usual generic solution is to tokenize during index to generate all alternative completions and not tokenize during query. So, the field type declaration is not symmetric. There are examples of that in Solr distribution. And you would look at using something like (index-time only) EdgeNGramFilterFactory instead of much more expensive regex matching.
For your specific case, you may want to look at testPathHierarchyTokenizer which does that for you automatically.
And if your content were more like full URLs than just path, you could also be interested by a custom update request processor chain that includes URLClassify URP. It is not very documented, but mentions generating url parts, which is what I think you would want.

solr highlighting in old and new versions

I am migrating a web site from an old version of solr (1.4.1) to the current release version (5.2.1) on a different machine and noticing some differences.
In the old version, I could get highlighting with a url like this:
http://localhost:8983/solr/select?indent=on&q=text:software/&start=0&rows=10&fl=id,score,title&wt=json&hl=on&hl.fragsize=200
In the new version, one thing that's different is I need to specify a collection. Another difference is that the new version gives an error if I put text: in front of the value of q.
So, taking into account those differences, I end up with a URL like this:
http://localhost:8983/solr/default/select?indent=on&q=software/&start=0&rows=10&fl=id,score,title&wt=json&hl=on&hl.snippets=1&hl.fl=%2a&hl.fragsize=200
That second URL does not give me highlighting fragments/snippets. That is to say, where the old URL would give something like this:
"highlighting":{
"document0_id":{"text":["The <em>software</em> is awesome"]}}
The new URL gives something like this:
"highlighting":{
"document0_id":{}}
What do I need to do to get highlighting fragments returned in solr 5.2.1?
[edited]
In addition, I tried selecting a single document by its id on both machines. On the old machine, a url like
http://localhost:8983/solr/select?wt=json&indent=true&q=id:thedocumentid
returns some JSON that includes a text field containing the full searchable text of the original HTML document. On the new machine a similar url (but one that includes the collection):
http://localhost:8983/solr/default/select?wt=json&indent=true&q=id:thedocumentid
...returns similar JSON that does not include the text field.
I note that searching returns the correct results; the problem is that on the new machine, the results do not include the highlighting fragments.
So it seems like maybe the issue is that I need to specify that these documents have a text field when I index them; how do I do that?

A colleague (not tempted by the bounty) noticed that my text field had stored="false" in my schema.xml and suggsted changing it to true. That did the trick.

In the first query you are specifically searching in the text field and in the second its not.
And in the second you have mentioned hl.fl which means "Specifies a list of fields to highlight. Accepts a comma- or space-delimited list of fields for which Solr should generate highlighted snippets. If left blank, highlights the defaultSearchField"
Try again by making the changes...
http://localhost:8983/solr/default/select?q=text:software&start=0&rows=10&fl=id,score,title&wt=json&hl=on&hl.fragsize=200

Trigger Rule with Filefield 'when field has changed' in Drupal 7

I've got a set of different rules that check if various field types have been updated 'after updating existing content'. The problem is that each one works fine, except for the only filefield type which will not work. The condition used is a 'NOT Data comparison' on the field checking the 'node-unchanged' version against the new 'node' version. This works with every other type of field and actions appropriately (that I have used at least), just not the filefield type; the rule just fires regardless of changes or not on the filefield.
I also found this post about a very similar problem: https://www.drupal.org/node/1011014#comment-10040082
I think I have the makings of a workaround, but I just wanted to check this with some fellow developers first as my PHP isn't the best.
If I were to enable the PHP module then add a condition in rules that checks the 'source' attribute to see if a new file has been added... would this work? The code I have is:
if (isset($object->field_FILEFIELD_NAME[0]['source'])) { //Check for new files }
I believe that $object is the node passed on by rule function as argument.
Is this a good idea/best approach? Any ideas or workarounds would be great.

Yes, you can use PHP for filefield checks as it is more clear (but always less secure). Here are my suggestions:
Create a Rule component (of type "Condition set (AND)") instead of a rule to do this check.
Use a parameter with your rule component (of type Node) so you have the $node available.
Use php to do the checks. Get the file data from the filefield using $node.
Create a normal rule (eg with event Before Saving Content) where you will use this component as a condition among others.
As you said you need to check for differences not for empty values, right? So instead of using isset you need to see if there is a different fid (which normally changes when there is a different file). Other methods are available and if you want to see which data can change do a dpm() to the filefield using devel module.
In order to get the unchanged value of the filefield use the same component on you main Rule but with the $unchanged node as parameter.

Do not allow ".xml"/".html"/"index" in URI?

I'm going through Lift's basics in Section 3.2 SiteMap of Simply Lift and one thing struck me.
Using the default SiteMap code, you can ask for, say, info view in three ways:
GET /info,
GET /info.html,
GET /info.xml (why?).
What is more, you can request index view in four different ways:
GET /,
GET /index,
GET /index.html,
GET /index.xml.
How can I limit this behaviour to GET / for directories and GET /info for files?
P.S. All of these return 200 OK:
foursquare.com/,
foursquare.com/index,
foursquare.com/index.html,
foursquare.com/index.xml.
Shouldn't one resource have one URL only?

There are actually more than four ways that it can be parsed. The full list of known suffixes (any of which can be used to access the page) can be found here.
I think the reason for that is that lift can be used to serve any resource, so most are explicitly added by default.
I think you could disable Lift's processing of all extensions by adding this to Boot.scala:
LiftRules.explicitlyParsedSuffixes = Nil
However, I wouldn't recommend that as there may be some side-effects.
Using Req with RestHelper you can specify the suffix explicitly, but I don't know if there is such a construct to do so with Sitemap.

Actually, the code to determine whether Lift should handle the request or not is here. You can see the default extensions in the liftHandled method directly above, but they can all be overridden with LiftRules.liftRequest. Something like:
LiftRules.liftRequest append {
case r => Full(r.path.suffix.trim == "")
}
Should do the trick.
As far as why it works that way, Jason is right that Lift is designed to handle multiple types of dynamic resource.

CakePHP RequestHandler: setContent/renderAs/respondAs .. what?

Can someone please explain these functions:
RequestHandlerComponent::renderAs()
RequestHandlerComponent::respondAs()
RequestHandlerComponent::setContent()
It feels slightly redundant to have all three of them (as public methods anyway). If I want to respond to a request with a PDF file, does that mean I'd have to call all three functions? How should I use these in my controller?

They're all different. From the API Docs:
renderAs
Sets the layout and template paths for the content type defined by $type.
I.e. more or less a shortcut for $this->layout = '...' and $this->render(...).
respondAs
Sets the response header based on type map index name. If DEBUG is greater than 2, the header is not set.
Outputs header(...).
setContent
Adds/sets the Content-type(s) for the given name. This method allows content-types to be mapped to friendly aliases (or extensions), which allows RequestHandler to automatically respond to requests of that type in the startup method.
Doesn't actually do anything to the output, just allows you to add new types that are not defined by default.
For outputting a PDF (assuming you have it as a file already) you should actually use a Media View.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Solr Config API: json overlay affecting managed-schema? - solr

Related

How to query solr field for a substring

solr highlighting in old and new versions

Trigger Rule with Filefield 'when field has changed' in Drupal 7

Do not allow ".xml"/".html"/"index" in URI?

CakePHP RequestHandler: setContent/renderAs/respondAs .. what?

Categories

Resources