I'm implementing a full text search based on the song database on GuitarParty.com. The data consists of lyrics in multiple languages, which is not a problem per se.
However, when search results are returned using snippeted_fields all accented characters within words, such as ÚúÉéÍí, are returned using their generic unaccented versions, UuEeIi.
This is how I form my query:
query = search.Query(
query_string=qs,
options=search.QueryOptions(
sort_options=search.SortOptions(
#match_scorer=search.MatchScorer(),
match_scorer=search.RescoringMatchScorer(),
expressions=[
search.SortExpression(expression='_score + importance * 0.03', default_value=0)
#search.SortExpression(expression='_score', default_value=0)
],
limit=1000,
),
cursor=cursor,
returned_fields=['title','atomtitle','item', 'image'],
snippeted_fields=['title','atomtitle', 'body','item'],
)
)
I'm pretty sure this is is not an encoding issue since everything looks just right if I pull my document fields directly (as I do with the titles). It's only the snippeted exoressions that display incorrectly.
To better see what I'm referring to you can take my test engine for a spin here: http://gp-search.appspot.com/ and search for something Icelandic. Example phrase: Vísur vatnsenda Rósu
This will return a document with this snippet:
Augun min og augun þin. O þa fogru steina. Mitt er þitt og þitt er mitt, þu veist hvað eg mei- na. Langt er siðan sa eg hann sannlega friður var hann.
Correctly spelled snippet should be:
Augun mín og augun þín. Ó þá fögru steina. Mitt er þitt og þitt er mitt, þú veist hvað eg mei- na. Langt er síðan sá ég hann sannlega friður var hann.
Am I better off generating my own snipped from the document data, or is there something I can do to pull snippets with accented characters within words?
The data you put in gets normalized so that you dont have to worry about accents or missing accents when searching it.
Related
My solr documents consists of information as described below:
{
"rid":0,
"resourceName":"cd05",
"metadata":"[u'org.apache.tika.parser.DefaultParser', u'org.apache.tika.parser.txt.TXTParser']",
"content":"\n\n\n\n\n\n\n\n\nFurthermore , as an encouragement to revisionist thinking , it manifestly is fair to admit that any fraternity has a constitutional right to refuse to accept persons it dislikes . The Unitarian clergy were an exclusive club of cultivated gentlemen -- as the term was then understood in the Back Bay -- and Parker was definitely not a gentleman , either in theology or in manners . Ezra Stiles Gannett , an honorable representative of the sanhedrin , I see not how a miracle proves a doctrine '' .\n",
"_version_":1626177165603110912
}
I am searching on content and want to use highlight feature of Solr. My search string is http://localhost:8983/solr/core_name/select?hl.fragsize=10&hl.fl=content&hl=on&indent=on&q=content:*fair\%20to\%20admit*&wt=json. Despite providing fragsize as parameter I am getting whole content in highlight. I only want 10 nearby characters not whole text.
How can I acheive this ?
Is it possible to conduct multiple spatial queries within the same SOLR (3.1+) request?
We currently have a need to allow user to search for inventory with a location of their choice via a frontend search form. But we want to also add another spatial search behind the scenes so it will include more inventory. The resulting search would result in a venn diagram type of search.
Edit 10.4.2011
Example construct: q=*:*&fq={!geofilt}&sfield=Location&(ClientId:"client1"&pt=40.68063802521456,-74.00390625&d=80.4672)%20OR%20_query_:(ClientId:"client2"&pt=36.1146460,-115.1728160&d=80.4672)
The above construct does not work, but hopefully demonstrates what I am trying to accomplish.
This is old, but it doesn't seem like it ever got a full answer. I had the same issue and found that this syntax works:
q =*:*& fq = (({
!geofilt sfield = Location pt = 40.68063802521456,
-74.00390625 d = 80.4672
}
AND ClientId : "client1")OR({
!geofilt sfield = Location pt = 36.1146460,
-115.1728160 d = 80.4672
}
AND ClientId : "client2"))
It looks like, you like to run N querys in one request in order to get one result set per query?!
So Field Collapsing ( http://wiki.apache.org/solr/FieldCollapsing ) is what you are looking for. Unfortunately FieldCollapsing is only available from 3.3.
Depending on your needs, maybe counted results from different faceted searches could be also useful?!
What if you moved your second location query into an additional filter query, like below:
q=*:*&fq={!geofilt}&sfield=Location&(ClientId:"client1"&pt=40.68063802521456,-74.00390625&d=80.4672)&fq={!geofilt}&sfield=Location&(ClientId:"client2"&pt=36.1146460,-115.1728160&d=80.4672)
Will that provide the results that you are looking for? It might end up being too limiting, but thought it was worth trying.
You might also try:
q=*:*&fq={!geofilt}&sfield=Location&((ClientId:"client1"&pt=40.68063802521456,-74.00390625&d=80.4672)%20OR%20(ClientId:"client2"&pt=36.1146460,-115.1728160&d=80.4672))
A friend of mine with little experience in (Telelogic Doors) DXL was given a
problem to search through a document for objects with possible matches of string.
The problem was :
We have 2 attributes: Severity and Likelihood
Please see the table below for their values:
Edit added (Sample):
A sample document looks as follows
2) So now if I have a combination like Severity = Negligible AND Likelihood = Improbable, then I want to parse through the document and then try to find all the objects that have these values and display the total number of objects.
3) Then I move to the next combination ex: Severity = Minor Injury AND Likelihood = Unlikely and then display the total objects for this combination.
4) So now I go through all the 25 combinations and display the total for each combination.
Trouble is I have no experience of DXL .
I know how to do it in C/C++ but not in DXL.
Need a DXL based solution to above.
Must you do this in DXL? It may be much easier to do this another way. For example, depending on how the documents are structured, you might be able to create a view categorized by severity and likelihood, and then present totals for each category.
Or you could export the data and calculate the totals easily using a spreadsheet.
UPDATE:
DXL is merely a XML format that applies to Domino. So once you have a database in DXL format, you can parse it like any other XML document using C/C++ if you're most comfortable with that. The key to this task, then, is to get the database into a DXL format.
With the Lotus Notes C/C++ API you can create a DXLExport from a NotesSession object, and call into the DXLExporter class to perform the export (excuse me if I'm messing up the object names - I'm used to LotusScript mainly).
Another option that could work for you is to use this DXLExporter Wizard for Domino 8.5. That will take the work out of creating the DXL and you can focus on parsing it instead.
This has nothing to do with Domino DXL, but everything with Telelogic Doors eXtension Language. The documentation: http://publib.boulder.ibm.com/infocenter/rsdp/v1r0m0/index.jsp?topic=/com.ibm.help.download.doors.doc/topics/doors_version9_1.html
Suggestion: remove the lotus-notes tag.
Amitd, the most straight-forward path is to create a view with object heading, object text, severity, likelihood, and any other relevant attributes; then perfom the basic export to Excel. Once in Excel, we'll manipulate the data as required.
Open the exported document, and sort by Severity, then Likelihood. Create the aggregate calculations using the Excel built-in functions: COUNTIF, and Data > GROUP and Data > SUBTOTAL options. You may then sort on the aggregate totals, or filter on the attributes for different combinations.
FYI - DXL is DOORS eXtension Language--nothing to do with Domino.
I know I am a bit late to the party here but what you are looking for is this:
First save your input file as a csv.
Module m = current
Object o
Stream infile = read("PathToYourCsvFile")
string inline = ""
infile >> inline
while(!end(infile))
{
string Severity = ""
string Likelihood = ""
// Here do some code to get the values from the line in the csv
// If you still are interested I can add this in with an update later.
for o in m do
{
if((o."Severity" "" == Severity) && (o."Likelihood" "" == Likelihood))
{
count++;
}
}
infoBox "Severity = " Severity " and Likelihood = " Likelihood " MATCHES: " count ""
}
This will pop up a box after each line is calculated with the number of matches found. You can easily have it just popup one box at the end with all of the matches if you wanted. Again if you are still interested, give me some more info on what you want for inputs and outputs and I can give you more accurate code.
Simple one really. In SQL, if I want to search a text field for a couple of characters, I can do:
SELECT blah FROM blah WHERE blah LIKE '%text%'
The documentation for App Engine makes no mention of how to achieve this, but surely it's a common enough problem?
BigTable, which is the database back end for App Engine, will scale to millions of records. Due to this, App Engine will not allow you to do any query that will result in a table scan, as performance would be dreadful for a well populated table.
In other words, every query must use an index. This is why you can only do =, > and < queries. (In fact you can also do != but the API does this using a a combination of > and < queries.) This is also why the development environment monitors all the queries you do and automatically adds any missing indexes to your index.yaml file.
There is no way to index for a LIKE query so it's simply not available.
Have a watch of this Google IO session for a much better and more detailed explanation of this.
i'm facing the same problem, but i found something on google app engine pages:
Tip: Query filters do not have an explicit way to match just part of a string value, but you can fake a prefix match using inequality filters:
db.GqlQuery("SELECT * FROM MyModel WHERE prop >= :1 AND prop < :2",
"abc",
u"abc" + u"\ufffd")
This matches every MyModel entity with a string property prop that begins with the characters abc. The unicode string u"\ufffd" represents the largest possible Unicode character. When the property values are sorted in an index, the values that fall in this range are all of the values that begin with the given prefix.
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html
maybe this could do the trick ;)
Altough App Engine does not support LIKE queries, have a look at the properties ListProperty and StringListProperty. When an equality test is done on these properties, the test will actually be applied on all list members, e.g., list_property = value tests if the value appears anywhere in the list.
Sometimes this feature might be used as a workaround to the lack of LIKE queries. For instance, it makes it possible to do simple text search, as described on this post.
You need to use search service to perform full text search queries similar to SQL LIKE.
Gaelyk provides domain specific language to perform more user friendly search queries. For example following snippet will find first ten books sorted from the latest ones with title containing fern
and the genre exactly matching thriller:
def documents = search.search {
select all from books
sort desc by published, SearchApiLimits.MINIMUM_DATE_VALUE
where title =~ 'fern'
and genre = 'thriller'
limit 10
}
Like is written as Groovy's match operator =~.
It supports functions such as distance(geopoint(lat, lon), location) as well.
App engine launched a general-purpose full text search service in version 1.7.0 that supports the datastore.
Details in the announcement.
More information on how to use this: https://cloud.google.com/appengine/training/fts_intro/lesson2
Have a look at Objectify here , it is like a Datastore access API. There is a FAQ with this question specifically, here is the answer
How do I do a like query (LIKE "foo%")
You can do something like a startWith, or endWith if you reverse the order when stored and searched. You do a range query with the starting value you want, and a value just above the one you want.
String start = "foo";
... = ofy.query(MyEntity.class).filter("field >=", start).filter("field <", start + "\uFFFD");
Just follow here:
init.py#354">http://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/search/init.py#354
It works!
class Article(search.SearchableModel):
text = db.TextProperty()
...
article = Article(text=...)
article.save()
To search the full text index, use the SearchableModel.all() method to get an
instance of SearchableModel.Query, which subclasses db.Query. Use its search()
method to provide a search query, in addition to any other filters or sort
orders, e.g.:
query = article.all().search('a search query').filter(...).order(...)
I tested this with GAE Datastore low-level Java API. Me and works perfectly
Query q = new Query(Directorio.class.getSimpleName());
Filter filterNombreGreater = new FilterPredicate("nombre", FilterOperator.GREATER_THAN_OR_EQUAL, query);
Filter filterNombreLess = new FilterPredicate("nombre", FilterOperator.LESS_THAN, query+"\uFFFD");
Filter filterNombre = CompositeFilterOperator.and(filterNombreGreater, filterNombreLess);
q.setFilter(filter);
In general, even though this is an old post, a way to produce a 'LIKE' or 'ILIKE' is to gather all results from a '>=' query, then loop results in python (or Java) for elements containing what you're looking for.
Let's say you want to filter users given a q='luigi'
users = []
qry = self.user_model.query(ndb.OR(self.user_model.name >= q.lower(),self.user_model.email >= q.lower(),self.user_model.username >= q.lower()))
for _qry in qry:
if q.lower() in _qry.name.lower() or q.lower() in _qry.email.lower() or q.lower() in _qry.username.lower():
users.append(_qry)
It is not possible to do a LIKE search on datastore app engine, how ever creating an Arraylist would do the trick if you need to search a word in a string.
#Index
public ArrayList<String> searchName;
and then to search in the index using objectify.
List<Profiles> list1 = ofy().load().type(Profiles.class).filter("searchName =",search).list();
and this will give you a list with all the items that contain the world you did on the search
If the LIKE '%text%' always compares to a word or a few (think permutations) and your data changes slowly (slowly means that it's not prohibitively expensive - both price-wise and performance-wise - to create and updates indexes) then Relation Index Entity (RIE) may be the answer.
Yes, you will have to build additional datastore entity and populate it appropriately. Yes, there are some constraints that you will have to play around (one is 5000 limit on the length of list property in GAE datastore). But the resulting searches are lightning fast.
For details see my RIE with Java and Ojbectify and RIE with Python posts.
"Like" is often uses as a poor-man's substitute for text search. For text search, it is possible to use Whoosh-AppEngine.
Simple one really. In SQL, if I want to search a text field for a couple of characters, I can do:
SELECT blah FROM blah WHERE blah LIKE '%text%'
The documentation for App Engine makes no mention of how to achieve this, but surely it's a common enough problem?
BigTable, which is the database back end for App Engine, will scale to millions of records. Due to this, App Engine will not allow you to do any query that will result in a table scan, as performance would be dreadful for a well populated table.
In other words, every query must use an index. This is why you can only do =, > and < queries. (In fact you can also do != but the API does this using a a combination of > and < queries.) This is also why the development environment monitors all the queries you do and automatically adds any missing indexes to your index.yaml file.
There is no way to index for a LIKE query so it's simply not available.
Have a watch of this Google IO session for a much better and more detailed explanation of this.
i'm facing the same problem, but i found something on google app engine pages:
Tip: Query filters do not have an explicit way to match just part of a string value, but you can fake a prefix match using inequality filters:
db.GqlQuery("SELECT * FROM MyModel WHERE prop >= :1 AND prop < :2",
"abc",
u"abc" + u"\ufffd")
This matches every MyModel entity with a string property prop that begins with the characters abc. The unicode string u"\ufffd" represents the largest possible Unicode character. When the property values are sorted in an index, the values that fall in this range are all of the values that begin with the given prefix.
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html
maybe this could do the trick ;)
Altough App Engine does not support LIKE queries, have a look at the properties ListProperty and StringListProperty. When an equality test is done on these properties, the test will actually be applied on all list members, e.g., list_property = value tests if the value appears anywhere in the list.
Sometimes this feature might be used as a workaround to the lack of LIKE queries. For instance, it makes it possible to do simple text search, as described on this post.
You need to use search service to perform full text search queries similar to SQL LIKE.
Gaelyk provides domain specific language to perform more user friendly search queries. For example following snippet will find first ten books sorted from the latest ones with title containing fern
and the genre exactly matching thriller:
def documents = search.search {
select all from books
sort desc by published, SearchApiLimits.MINIMUM_DATE_VALUE
where title =~ 'fern'
and genre = 'thriller'
limit 10
}
Like is written as Groovy's match operator =~.
It supports functions such as distance(geopoint(lat, lon), location) as well.
App engine launched a general-purpose full text search service in version 1.7.0 that supports the datastore.
Details in the announcement.
More information on how to use this: https://cloud.google.com/appengine/training/fts_intro/lesson2
Have a look at Objectify here , it is like a Datastore access API. There is a FAQ with this question specifically, here is the answer
How do I do a like query (LIKE "foo%")
You can do something like a startWith, or endWith if you reverse the order when stored and searched. You do a range query with the starting value you want, and a value just above the one you want.
String start = "foo";
... = ofy.query(MyEntity.class).filter("field >=", start).filter("field <", start + "\uFFFD");
Just follow here:
init.py#354">http://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/search/init.py#354
It works!
class Article(search.SearchableModel):
text = db.TextProperty()
...
article = Article(text=...)
article.save()
To search the full text index, use the SearchableModel.all() method to get an
instance of SearchableModel.Query, which subclasses db.Query. Use its search()
method to provide a search query, in addition to any other filters or sort
orders, e.g.:
query = article.all().search('a search query').filter(...).order(...)
I tested this with GAE Datastore low-level Java API. Me and works perfectly
Query q = new Query(Directorio.class.getSimpleName());
Filter filterNombreGreater = new FilterPredicate("nombre", FilterOperator.GREATER_THAN_OR_EQUAL, query);
Filter filterNombreLess = new FilterPredicate("nombre", FilterOperator.LESS_THAN, query+"\uFFFD");
Filter filterNombre = CompositeFilterOperator.and(filterNombreGreater, filterNombreLess);
q.setFilter(filter);
In general, even though this is an old post, a way to produce a 'LIKE' or 'ILIKE' is to gather all results from a '>=' query, then loop results in python (or Java) for elements containing what you're looking for.
Let's say you want to filter users given a q='luigi'
users = []
qry = self.user_model.query(ndb.OR(self.user_model.name >= q.lower(),self.user_model.email >= q.lower(),self.user_model.username >= q.lower()))
for _qry in qry:
if q.lower() in _qry.name.lower() or q.lower() in _qry.email.lower() or q.lower() in _qry.username.lower():
users.append(_qry)
It is not possible to do a LIKE search on datastore app engine, how ever creating an Arraylist would do the trick if you need to search a word in a string.
#Index
public ArrayList<String> searchName;
and then to search in the index using objectify.
List<Profiles> list1 = ofy().load().type(Profiles.class).filter("searchName =",search).list();
and this will give you a list with all the items that contain the world you did on the search
If the LIKE '%text%' always compares to a word or a few (think permutations) and your data changes slowly (slowly means that it's not prohibitively expensive - both price-wise and performance-wise - to create and updates indexes) then Relation Index Entity (RIE) may be the answer.
Yes, you will have to build additional datastore entity and populate it appropriately. Yes, there are some constraints that you will have to play around (one is 5000 limit on the length of list property in GAE datastore). But the resulting searches are lightning fast.
For details see my RIE with Java and Ojbectify and RIE with Python posts.
"Like" is often uses as a poor-man's substitute for text search. For text search, it is possible to use Whoosh-AppEngine.