Is it possible to use different Solr cores for different subtrees within a domain in TYPO3? - solr

Background: Our pagetree doesn't work with language overlays nor with seperate domains for each domain. The different languages are represented by subtrees below the domain.
Pagetree:
example.com (Rootpage + sys_domain)
de
Seite 1
Seite 2
en
Page 1
Page 2
We're using TYPO3, EXT:solr and Solr.
Is it possible to use different Solr cores for different subtrees within a domain in TYPO3?
So far, I found a solution for handling different languages and cores based on TS-Conditions: [globalVar = GP:L = 1]. But that's not our use-case.
Further, I found a 6 years old question which exactly represents the use-case, but doesn't have a positive answer (https://forum.typo3.org/index.php/t/158570/).
Can somebody give me a hint? Is this use-case possible?

Yes, this is possible with version 7 of EXT:solr
Set "Use as Rootpage" on page "de" and page "en"
Remove "Use as Rootpage" on Rootpage, if set (if using EXT:realurl, make sure to set it in the configuration explicitly)
Add the following lines to your AdditionalConfiguration.php
`
$GLOBALS['TYPO3_CONF_VARS']['EXTCONF']['solr']['sites']['<uid of page "de">']['domains'] = ['your.domain.de'];
$GLOBALS['TYPO3_CONF_VARS']['EXTCONF']['solr']['sites']['<uid of page "en">']['domains'] = ['your.domain.de'];
Configure the cores on page "de" and "en" in TypoScript constants as you need it
`
plugin.tx_solr {
solr {
scheme = http
host = <your-solr-dhost>
port = 80
path = <your-solr-core>
}
}
Then you need to scheduler tasks to index the site ... one for "de" and one for "en".
Works pretty nice here :-)

From my experience it is most probably not possible.
The reason is, that EXT:solr tools need one common TypoScript configuration which works for both backend and frontend context at the same time.
That is why they choosed to evaluate the TypoScript configuration not based on a frontend request (which would make it possible to differ the configuration on any position in the page tree) but based on pages marked as domain roots.
So, the configuration parser looks for such pages first and then evaluates TypoScript setup at that specific level/point in the tree.
I think your only solution is to make rather deep changes to the machanism described above, which is possible and I made it myself (to exclude some root level pages because I had many of them).
This way you can probably instruct the engine to fetch the TypoScript setup from whatever fixed position in the tree you want.
If you are interested to try this, this will be the class method to extend: ApacheSolrForTypo3\Solr\ConnectionManager::getConfiguredSolrConnections()

Related

Solr - Bringing back snippets from indexed data

I have a Solr/Lucene set up where I have indexed a set of documents (MS Word files) and can happily search the content of these documents. However I would like to return a snippet from within the content of the document which shows where the matching line (+/- 5 words from the match term) is. I have tried to follow a range of Google hits but my indexing does not seem to have a direct access to the "content".
Can anyone give me some basic and simple pointers to where I might have made any errors on this - I have based all my work so far on the guidance and examples of the Solr Reference Guide - so I am not sure if the issue is in the search parameters or the original index.
I am doing this to create a clear set of user requirements for building an end solution rather than creating the end solution myself, so I am no expert on the tools and do not need to become one, just need to evidence what is possible with this tool set.
As MatsLindh noted above the issue was that the config was not drawing across the actual content of the Tika parse into a specific field, and so there was no full content of the text to display and highlight
To resolve this I followed the link (https://lucene.apache.org/solr/guide/7_1/uploading-data-with-solr-cell-using-apache-tika.html#configuring-the-solr-extractingrequesthandler) to the guidance documents and reviewed the part on fmap and used the example given for Last Modified Date as a guide on what to apply.
I then went to my solrconfig.xml file in the relevant core folder and added in the following line in the code beneath an already present fmap entry:
<str name="fmap.content">testcontent</str>
I had previously set up the testcontent field under the solr web interface in my core. I then re-ran my indexing line via a command prompt and that seemed to do the trick in terms of pulling out the basic content and rapping it with a basic emphasis.
All thanks for the input on this - still a lot more I want to test to help develop a clear requirement set but this really helps prove some of the basics are not complected.

How can I keep changes in the index when I use DIH fullimport?

I'm using Solr 6.5 to index files from multiples ftp files into multiples cores (having one core for each type of document, like audio file, image, software, video and documents).
The situation is that I'm doing this to populate an app that in its front end has a social networking approach in which every user can add new tags or modify other metadata without restriction.
So when I execute again data import handler to add new files to my application, it erase the index that previosly was modified for the user and set up with the data-config default configuration.
My question: is there a way to tell DIH, if the id exists, continues without importing and just adds the files which don't have an id in the index?
If this is not possible, can I do something similar in a different way?
Thanks for everything!
Sounds like you are doing a full import with default settings. One of them is clean, which defaults to true and deletes the whole index before the import.
Try setting it to false and also look at preImportDeleteQuery and postImportDeleteQuery for even more precision.

Do not allow ".xml"/".html"/"index" in URI?

I'm going through Lift's basics in Section 3.2 SiteMap of Simply Lift and one thing struck me.
Using the default SiteMap code, you can ask for, say, info view in three ways:
GET /info,
GET /info.html,
GET /info.xml (why?).
What is more, you can request index view in four different ways:
GET /,
GET /index,
GET /index.html,
GET /index.xml.
How can I limit this behaviour to GET / for directories and GET /info for files?
P.S. All of these return 200 OK:
foursquare.com/,
foursquare.com/index,
foursquare.com/index.html,
foursquare.com/index.xml.
Shouldn't one resource have one URL only?
There are actually more than four ways that it can be parsed. The full list of known suffixes (any of which can be used to access the page) can be found here.
I think the reason for that is that lift can be used to serve any resource, so most are explicitly added by default.
I think you could disable Lift's processing of all extensions by adding this to Boot.scala:
LiftRules.explicitlyParsedSuffixes = Nil
However, I wouldn't recommend that as there may be some side-effects.
Using Req with RestHelper you can specify the suffix explicitly, but I don't know if there is such a construct to do so with Sitemap.
Actually, the code to determine whether Lift should handle the request or not is here. You can see the default extensions in the liftHandled method directly above, but they can all be overridden with LiftRules.liftRequest. Something like:
LiftRules.liftRequest append {
case r => Full(r.path.suffix.trim == "")
}
Should do the trick.
As far as why it works that way, Jason is right that Lift is designed to handle multiple types of dynamic resource.

Is it possible for an entry to have two URL in Expression Engine, and translate template names?

I'm currently making a bilingual Expression Engine 2.5.2 website. I'm using this technique to create the two langues, which works perfectly.
I have created a {country_code} global variable in the two index.php files which allows me to detect the current language.
Using this technique, I have no problems to get language-relative data when accessing an entry. My only concern is that I apparently have to privilege a language-specific "clean" URL.
Example entry:
{entry_id} = 123
{title} = My test article
{title_permalink} = my-test-article
{name_fr} = Mon article
{name_en} = My article
If I request http://www.example.com/index.php/en/blog/articles/my-test-article, I expect to to find, in english, "My article" using the template articles in the blog template group.
Everything is fine, but the french translation is accessible when requesting http://www.example.com/index.php/fr/blog/articles/my-test-article. The correct translation of the URL should be http://www.example.com/index.php/fr/blogue/articles/mon-article-test.
Anyone encountered a problem like this? Any solutions via extensions or modules?
I believe the Transcribe module solves this by both providing the ability to translate template group and template names, and having you create a separate entry for each language and piece of content in your site (hence, you have two separate URL titles). But that means buying into their entire methodology for a multi-lingual site.
Myself, I usually just stick to using the entry_id instead of the url_title, and live with the template names being in the primary language.
The best way I found to achieve this is by embedding templates with segment translations, duplicating template groups and duplicating channels.
In the blog/articles template:
{embed="shared/.head" segment_2_translation="blogue" segment_3_translation="articles"}
In the blogue/articles template:
{embed="shared/.head" segment_2_translation="blog" segment_3_translation="articles"}
In shared/.head template:
[...] {if lang == "fr"}English{if:else}Français{/if} [...]
And then you can create a Articles (FR) and a Articles (EN) channels, and each will have their unique URL titles. You can also add a relationship custom field for each channel to associate an entry with it's translation.
It feels messy, but it is the only way I could make it work without modules, plugins or whatnot.

Sitecore country ISO in url

I have created a site with multiple languages in sitecore... I the content editor (system > languages) I have specified three languages (Dutch, English and German). No I have 2 problems.
When an item has, for example: an English version but no German and Dutch version and I type the address to the German site: www.testsite.com/de I get the German site, but without content. In this case I want a 404 page to be shown.
Another problem is when I go to language that is not specified in system > language and also on the item is still get an empty site. In this case I also want a 404 page to be shown. Sitecore shows the page as long as it is a valid ISO-code.
I'm using Sitecore 6.4
Does anybody has a solution for these problem(s)?
Thanks in advance!
mrtentje
My LinkManager is specified as follows in the Web.config:
<add name="sitecore" type="Sitecore.Links.LinkProvider, Sitecore.Kernel" addAspxExtension="true" alwaysIncludeServerUrl="false" encodeNames="true" languageEmbedding="asNeeded" languageLocation="filePath" shortenUrls="true" useDisplayName="false"/>
Unfortunately you have to manage both of these scenarios manually in Sitecore, they both have quite simple solutions but will require some development on your part.
For the first (accessing of pages without translations) I think you would need to extend the current ItemResolver within Sitecore and have it explicitly check that a version exists for the language that has been selected. I haven't implemented that myself but that's how I'd look at handling it.
The second (only accepting certain languages) is something I have handled, and it really bothered me that Sitecore couldn't handle it itself (though perhaps it does and I missed it). For this I created a step in the pipeline immediately after the LanguageResolver called PermissableLanguageChecker. This checks to see if the current language of the request is one of certain allowable values, and if it isn't it sets the language back to the default language, or in your case throw a 404.
For the "allowable values", I read them from the site config with a new property there:
<site name="website" ... permissableLanguages="pl-PL,en" language="pl-PL" ... />
That permissableLanguages property is handy as we can also use it later on in the site when presenting a language selection control to the user.
You may want to take a look at the Language Fallback module in the Sitecore Shared Source Library. As it covers some of your scenarios.
http://trac.sitecore.net/LanguageFallback

Resources