How to indicate an word exception for stemming in Hunspell - solr

I am using Hunspell to stem words for a SOLR instance. For the most part, it seems to be working well.
I'm using the OpenOffice dic/aff files.
However, there are some notable word exceptions, and I'd like to be able to remove these as candidates for stemming.
A great example is "skier", which stems to "sky" because of the following:
in the .dic file
sky/MDRSGZ
relevant rule in the .aff file
SFX R y ier [^aeiou]y
Is there any way to indicate that skier and only skier should be left alone?

Yeah this is a very common thing, just remove the "R"
sky/MDSGZ
But you may then want to add back in on another line "skier" and any other versions of it.
skier/MS
I have had to make numerous changes to this file, and now really wish there was a better option.
For example
Butter -> Butt
Corner -> Corn
Easter -> East
And then another one that is really confusing,
Wind == Wound
On my site before we fixed it if you searched for wind like in "wind power" you ended up with a bunch of bruises and bloody wounds.
Because "wound" like in "I wound the clock" stemmed to wind.
We also decided to remove all RE prefixes. because things like
remarkable -> mark
remove -> move
reset -> set
restore -> store
So if you know of a better dictionary that is better for this please let me know. (I think the main problem is this dictionary is more intended for spell check then for stemming)
I would be willing to start and/or contribute to a git project for a real stemming dictionary to replace this spelling dictionary for everyone out there using this.

have you tried freeling? It is open sourced.
A demo page is here:
http://nlp.lsi.upc.edu/freeling/demo/demo.php
When I pick english, pos tagging I get the following result:
you wound the clock?
you wind the clock?
PRP VBD DT NN ?
also skier, wind power all get the noun stems. It is a great stemmer and analyzer.
not sure about licensing. the download page:
http://devel.cpl.upc.edu/freeling/downloads?order=time&desc=1

Related

pddl precondtion not working correctly in the plan

I am working on a project in pddl. The idea is to pick four balls and transfer them to the conveyer. (defined in the goal) The simple pickup, move and drop actions work fine but when I try to make it more complicated for eg. by adding different poses for the robot,detecting the item before picking, the plan does not follow the pre-condition. For e.g. focus on the pick action, The correct pose is not followed. Any ideas regarding the mistake in the code? The final plan should to have correct pose for the each action and detect the item one by one and not all at once
link below:
http://editor.planning.domains/#read_session=BzTaNrk4dQ
faulty output:
https://i.stack.imgur.com/ubWS8.png
Likely something missing/wrong in the precondition. You have this for the pickup action:
(exists (?f - pose ?g - gripper)
(at-pose robotarm pregrasppose))
Note that you don't use the variables ?f and ?g at all in the fluent.
Thanks haz for going through the code.
I was able to debug it. the assignment of preconditions was incorrect. the correct way of assigning a value to a parameter is the following:
(= ?p findshirt)
in the above line you assign findshirt type to 'p'

Considerate pathfinding AI

I want to have 1 wide streams of stuff be able to path-find so as to be considerate of the other stream going on the same 2 wide path.
Let's say I have a map like this: ("0"'s it cannot go, "-"'s you it can, "1" and "A" are the starting point while "2" and "b" are the destinations)
000000000000
0000000001A0
000000000--0
0B200------0
0--00------0
0------00000
0------00000
000000000000
If I have "A" path-find to "B" with the A* algorithm it would block the path from "1" to "2".("=" is the path)
000000000000
0000000001A0
000000000-=0
0B200======0
0=-00=-----0
0=====-00000
0------00000
000000000000
Yes I could path-find "1" to "2" then make the AB path but that won't always work. Case in point is this:
00000000000000000000
000000000000000001A0
00000000000000000--0
0B200------00------0
0--00------00------0
0------00------00000
0------00------00000
00000000000000000000
The A* path-finding from "1" to "2" blocks the path for "A" to "B"
000000000000000001A0
00000000000000000=-0
0B200------00=====-0
0-=00=====-00=-----0
0-====-00=====-00000
0------00------00000
00000000000000000000
"A" to "B" blocks "1" to "2"
000000000000000001A0
00000000000000000-=0
0B200------00======0
0=-00=====-00=-----0
0=====-00=====-00000
0------00------00000
00000000000000000000
Additional Clarification: "A", "B", "1", and "2 can be anywhere in a user created map. There will be any number from 1 to 10 paths going at the same time and starting and stopping separately though the AI only needs to take account other current paths. It also needs to happen live so it cannot take even seconds to compute.
So how can I make an AI smart enough to not block another path? Right now I'm using the A* so is there an improvement to it or should I use an entirely new AI system? (both work for me)
If I have understood correctly, you are searching for Cooperative Pathfinding. In the last decade, many solutions for this problem have been proposed. You can find a nice summary of them in this paper.
I'll give you a small recap:
Local Repair A Each agent
searches for a route to the destination using the A* algorithm, ignoring all other agents except for its current neighbors. The agents then begin to
follow their routes, until a collision is imminent. Whenever an agent is about to move into an occupied position it instead recalculates the remainder of its route. A bit of "brute-forcing", it is not really state of the art but it is "easy" to implement and the current industry standard in video-games. Unfortunately I'm not able to find the pseudo-code for the algorithm. :(
Cooperative A s a new algorithm for solving the Co-
operative Pathfinding problem. The task is decoupled into
a series of single agent searches. The individual searches
are performed in three dimensional space-time, and take account of the planned routes of other agents. A wait move
is included in the agent’s action set, to enable it to remain
stationary.
Hierarchical Cooperative A* As before but in a hierarchical way.
Windowed Hierarchical Cooperative A* The state of the art at the time (2005 I think). There is an interesting demo on the internet with Java source code and everything. To understand why WHCA* is better go to page 3 of the paper.
I hope this can be enough to start exploring this field by yourself if you need. :)
As with most problems - finding the actual constraints can help to identify the search space you are looking at.
It is not clear from the example problems if both paths can take wildly different routes or if the paths have to travel alongside each other? Do you know if there will there always be a solution where all the paths on the map can be routed simultaneously?
If you simply require a road to support n-wide gaps ( for n parallel paths), this seems like a simple tweak to the search space/problem representation which could probably be done with A*.
Also - you have mentioned streams - is behaviour over time a dimension of the problem? - could there be an option for time-sharing (alternate use) of a narrow gap between multiple streams? Or perhaps shorter convoys of stream elements that can path-find on their own?

How to search three or more arrays row by row for an optimum value in matlab

I have a few variables and here they are, three variables "R1, R2 and R3" each have a size of [40 x 1].
I have a fourth variable U of the same dimension. For every U(i) I need to search for an optimum value within R1(i), R2(i) and R3(i) which would return a single value solution. I intend to plot the optimum value against U9i).I have been trying to wrap my head around the knnsearch function but no luck.
Any one out there who could please help??
Thanks
Well when I can't wrap my head around something, I don't come here first.
A lot of people forget this one because we are online, but read a book on the topic. Have your code open so when you see something in the book, test it out.
Draw out any type of diagram. I call these "Napkin Diagrams", because I write it on anything, even a napkin.
I play with code until my keyboard has no letters left on it, then I keep plugging away until the keys fall off
Explore the language API's
Check for public repositories that you can play with
Google, is okay for a quick reference, but google will not teach you anything other than how to google
I talk my code over with myself all the time, people think I'm nuts, but so do I . . It actually works sometimes.
Then if I still can't get it, I come here with a list of things that I have tried, sample code that has not worked, etc.
I used to hate when people told me this, but that was the best thing anyone could have done for me so I tend to do the same now" Thinking about coding is a big part, but u have to get done wht u can. Then we all know what level u are at. Plus it being the end of semester a lot of these types of questions are homework...
Thinking is good, now turn those thoughts into a conceptual design . It's okay to be wrong in this stage, its all just conceptual
If I understood correctly, this might be what you need:
RR = [R1(:) R2(:) R3(:)];
d = bsxfun(#minus,RR, U(:));
[m mi] = min(abs(d),[],2);
answer = RR(:,mi);
first - put the three vectors into a single matrix:
RR = [R1(:) R2(:) R3(:)];
next, take the difference with U: bsxfun is ideal for this kind of thing
d = bsxfun(#minus,RR, U(:));
Now find the minimum absolute difference for each row:
[m mi] = min(abs(d),[],2);
The corresponding indices should allow you to find the "best fit"
answer = RR(:,mi);
I had to do some mind reading to get to this 'answer', so feel free to correct my misunderstanding of your problem!
update if you just need the highest of the three values, then
val = max([R1(:) R2(:) R3(:)]');
plot(U, val);
should be all you need...

Eclipse - How do I view java arrays / collections better in debugger

Viewing/Searching java arrays and collections in the Eclipse Java debugger is tedious and time-consuming.
I tried this promising plugin (in alpha as of Aug 2012)
http://www.cvast.tuwien.ac.at/projects/visualdebugging/ArrayExplorer
But it freezes Eclipse for simple arrays beyond a few hundred elements.
I do use Detail formatters, but that still needs clicking on each element to see the values.
Are there any better ways to view this array/collection data?
Use the 'Expressions' tab.
There you can type in any number of expressions and have them evaluated in the current scope.
ie: collection.size(), collection.getValueAt(i), ect...
Eclipse > Preferences > Java > Debug >Detail Formatter
This may be close to what you are looking for. It is another tedious work to setup but once done you can see the value of objects in Expressions window.
Here is link to start
override toString method of your class and you will be able to see what you want to see. i'm attaching example to show you exactly that.
Even though i could not find a way to see them in nice table/array, i found a halfway workaround.
The solution is to define a static method in a throwaway class that takes the array as input and returns a string of concatenated values that one wants to quickly glance at. it could include the array index and newlines to view results formatted nicely. It can be fine tuned to print out only certain array indices to reduce clutter.
This static method can then be used in the watch area.

Understanding parsing SVG file format

First off, gist here
Map.svg in the gist is the original Map I'm working with, got it off wikimedia commons.
Now, there is a land mass off the eastern cost of Texas in that original svg. I removed it using Inkscape, and it re-wrote the path in a strange new way. The diff is included in the gist.
Now this new way of writing the path blows up my parser logic, and I'm trying to understand what happened. I'm hoping someone here knows more about the SVG file format that I do. I will admit I have not read through the entire SVG standard spec, however the parts of it I did read didn't mention anything about missing commands or relative coordinates. Then again I may have been looking at the incorrect spec, not sure.
The way I understood it, SVG path data was very straight forward, something like this:
(M,L,C)[point{n}] .... [Z] then repeat ad-nauseum
Now the part I'm trying to understand is this new Inkscape has written out what seems like relative coordinates, without commands like L, or L being implied somehow. My gut is telling me what has happened here is obvious to someone. For what it's worth I'm doing my parsing in C.
If you're parsing SVG, why not look at the SVG specification?
Start a new sub-path at the given (x,y) coordinate. M (uppercase) indicates that absolute coordinates will follow; m (lowercase) indicates that relative coordinates will follow. If a moveto is followed by multiple pairs of coordinates, the subsequent pairs are treated as implicit lineto commands.
From: http://www.w3.org/TR/2011/REC-SVG11-20110816/paths.html#PathDataMovetoCommands
You said,
The way I understood it, SVG path data was very straight forward, something like this: (M,L,C)[point{n}] .... [Z]
I don't know where you got that information. Stop getting your information from that source.
I will admit I have not read through the entire SVG standard spec...
Nobody reads the entire spec. Just focus on the part you're implementing at the moment. You could also start with SVG Tiny, and work with that subset for now.
Path Grammar is where you should start when writing a parser. If you can't read it, then buy a book on compilers.
Path grammar: http://www.w3.org/TR/2011/REC-SVG11-20110816/paths.html#PathDataBNF

Resources