How to search for :) in Solr - solr

How does one search for specific punctuation in Solr, such as :)? I have tried URL encoding the text but I still get this message:
org.apache.solr.search.SyntaxError: Cannot parse ':': Encountered " ":" ": "" at line 1, column 0.
Was expecting one of:
<NOT> ...
"+" ...
"-" ...
<BAREOPER> ...
"(" ...
"*" ...
<QUOTED> ...
<TERM> ...
<PREFIXTERM> ...
<WILDTERM> ...
<REGEXPTERM> ...
"[" ...
"{" ...
<LPARAMS> ...
<NUMBER> ...
<TERM> ...
"*" ...
Additionally, I need to perform this search on a text field, not on a string field. How should I configure the analyser to save punctuation?
Note that searching google for the subject is impossible due to two prolific Solr contributors with the name "Smiley"!

What configurations you have for the text field?
You should take care the splitting is not happening on the puntuations e.g. if using StandardTokenizerFactory or word delimiter filter.
You can define a custom field with WhitespaceTokenizerFactory or KeywordTokenizerFactory and have further filters like lower case on it.
Also, There are some characters which Solr/Lucene uses for some operation e.g. + - ! ( ) { } [ ] ^ " ~ * ? :
You would need to escape the special characters with backslash. Check Escape Special Characters

instead of :) search for "\:\ )" , both chars :,) have special meaning in SOLR.
for all special operatos you need to escape by prefixing with '\' char .

Related

Need to write a regex to parse the command

Need to write a regex to get 3 groups from strings-
<whatever text including new lines optional -group 1>/command <text until \n or </p> is encountered- group 2><whatever text including new lines optional -group 3>
what I tried is-
Pattern pattern1 = Pattern.compile('(.*?)[/]command (.*?)\n?(.*?)');
It should give the following output for string-
some\nthing/command cmdtext/nasdfjaklsdjf\nfgskdlkfg\ndgsdfgsdfgsdfg
group 1 - some\nthing
group 2 - cmdtext
group 3 - asdfjaklsdjf\nfgskdlkfg\ndgsdfgsdfgsdfg
What I am not getting is how to get the occurrence of </p> and .* is not considering the group. Although this is working for me-
String a = '\na\na\n\n\n\n\n\naaa';
Pattern pattern2 = Pattern.compile('\n(?s:.)*');
Matcher mchr = GiphyPattern.matcher(a);
system.assert (mchr.matches());
This regular expression should match what you need:
'([\\s\\S]*)/command (.*?)(?:\n|</p>)([\\s\\S]*)'
You cannot match \n with .* So I am using \\s\\S instead (which is actually \s\S but with Apex escaped backslashes).

parsing a file in C language with regex

I am trying to do a parsing of a long file like this (the output of the command play in Linux):
File :1.mp3
In:0.00% 00:00:00.00 [00:03:14.51] Out:0 [ | ]
In:0.19% 00:00:00.37 [00:03:14.14] Out:16.4k [ | ]
In:0.29% 00:00:00.56 [00:03:13.95] Out:24.6k [======|======]
In:0.33% 00:00:00.65 [00:03:13.86] Out:28.7k [ =====|===== ]
In:0.43% 00:00:00.84 [00:03:13.67] Out:36.9k [ =====|===== ]
In:0.53% 00:00:01.02 [00:03:13.49] Out:45.1k [ -====|===== ]
In:0.62% 00:00:01.21 [00:03:13.30] Out:53.2k [ =====|===== ]
In:0.72% 00:00:01.39 [00:03:13.11] Out:61.4k [-=====|======]
In:0.81% 00:00:01.58 [00:03:12.93] Out:69.6k [-=====|=====-]
In:0.91% 00:00:01.76 [00:03:12.74] Out:77.8k [-=====|=====-]
In:0.96% 00:00:01.86 [00:03:12.65] Out:81.9k [ =====|===== ]
And so on
I would like to parse the percentage number.
How can i do it without saving the file into(because is too large ~ 100KB) a String.
i thought with this regular expression :"In:(\d{1,2}\.\d{2})"
how to do it?
Try this regex:
/^In:([0-9]{1,3}\.[0-9]{1,2})\%/gm
Explanation:
/
^ Matches start of string.
In: Matches "In:".
( ) Groups percentage (excl. sign).
[0-9]{1,3} Matches 1-3 (incl.) numbers.
\. Matches a dot.
[0-9]{1,2} Matches 1-2 (incl.) numbers.
\% Matches a percent sign.
/gm Allows multiple matches and makes ^ match beginning of line (not beginning of string), respectively.

org.apache.solr.search.SyntaxError: Cannot parse sku_str

See below error:
"error": {
"msg": "org.apache.solr.search.SyntaxError: Cannot parse 'sku_str:VFY:A5440M35A5ME': Encountered \" \":\" \": \"\" at line 1,
column 11.
code": 400 }
Escape solr query string , See below function
SolrUtils::escapeQueryChars — Escapes a lucene query string
http://php.net/manual/en/solrutils.escapequerychars.php

javacc C grammar and C "Bit fields" ; ParseException

I'm trying to use this javacc grammar https://java.net/downloads/javacc/contrib/grammars/C.jj to parse a C code containing bit fields
struct T{
int w:2;
};
struct T a;
The generated parser cannot parse this code:
$ javacc -DEBUG_PARSER=true C.jj && javac CParser.java && gcc -E input.c | java CParser
Java Compiler Compiler Version 5.0 (Parser Generator)
(type "javacc" with no arguments for help)
Reading from file C.jj . . .
File "TokenMgrError.java" is being rebuilt.
File "ParseException.java" is being rebuilt.
File "Token.java" is being rebuilt.
File "SimpleCharStream.java" is being rebuilt.
Parser generated successfully.
Note: CParser.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
C Parser Version 0.1Alpha: Reading from standard input . . .
Call: TranslationUnit
Call: ExternalDeclaration
Call: Declaration
Call: DeclarationSpecifiers
Call: TypeSpecifier
Call: StructOrUnionSpecifier
Call: StructOrUnion
Consumed token: <"struct" at line 5 column 1>
Return: StructOrUnion
Consumed token: <<IDENTIFIER>: "T" at line 5 column 8>
Consumed token: <"{" at line 5 column 9>
Call: StructDeclarationList
Call: StructDeclaration
Call: SpecifierQualifierList
Call: TypeSpecifier
Consumed token: <"int" at line 6 column 2>
Return: TypeSpecifier
Return: SpecifierQualifierList
Call: StructDeclaratorList
Call: StructDeclarator
Call: Declarator
Call: DirectDeclarator
Consumed token: <<IDENTIFIER>: "w" at line 6 column 6>
Return: DirectDeclarator
Return: Declarator
Return: StructDeclarator
Return: StructDeclaratorList
Return: StructDeclaration
Return: StructDeclarationList
Return: StructOrUnionSpecifier
Return: TypeSpecifier
Return: DeclarationSpecifiers
Return: Declaration
Return: ExternalDeclaration
Return: TranslationUnit
C Parser Version 0.1Alpha: Encountered errors during parse.
ParseException: Encountered " ":" ": "" at line 6, column 7.
Was expecting one of:
";" ...
"," ...
"(" ...
"[" ...
"[" ...
"(" ...
"(" ...
"," ...
";" ...
";" ...
";" ...
"[" ...
"(" ...
"(" ...
at CParser.generateParseException(CParser.java:4279)
at CParser.jj_consume_token(CParser.java:4154)
at CParser.StructDeclaration(CParser.java:433)
at CParser.StructDeclarationList(CParser.java:372)
at CParser.StructOrUnionSpecifier(CParser.java:328)
at CParser.TypeSpecifier(CParser.java:274)
at CParser.DeclarationSpecifiers(CParser.java:182)
at CParser.Declaration(CParser.java:129)
at CParser.ExternalDeclaration(CParser.java:96)
at CParser.TranslationUnit(CParser.java:77)
at CParser.main(CParser.java:63)
I tried to change (line 245)
(...) LOOKAHEAD( { isType(getToken(1).image) } )TypedefName() )
to
LOOKAHEAD( { isType(getToken(1).image) } ) TypedefName2()
(...)
void TypedefName2() : {}
{
TypedefName() (LOOKAHEAD(2) ":" <INTEGER_LITERAL> )?
}
but it doesn't work (same error) .
Is there a simple way to fix the javaCC grammar to handle Bit Fields ?
Try fixing this by modifying the StructDeclarator() rule on lines 310..313 as follows:
void StructDeclarator() : {}
{
( Declarator() [ ":" ConstantExpression() ] | ":" ConstantExpression() )
}
The idea is to remove the need of lookahead by letting the parser make a decision by checking if the struct declarator starts with a colon ":".

WPF string escaping - Exception is thrown while creating control template

I am trying to construct a Control template from code behind. Things were working fine till recently I found that the code was throwing an exception because of escape characters in string. The error message is dynamically constructed by retrieving from resource file.
The exception is
A first chance exception of type 'System.Windows.Markup.XamlParseException' occurred in PresentationFramework.dll
Additional information: Name cannot begin with the '#' character, hexadecimal value 0x40. Line 1, position 537.
//In this case when exception is thrown,
//string errorMessage = "Name cannot contain any of the following characters $ \" # ; ^ | "
public static ControlTemplate GetErrorTemplate(string errorMessage)
{
string xamlString = "<ControlTemplate xmlns=\"http://schemas.microsoft.com/winfx/2006/xaml/presentation\" " +
"xmlns:x=\"http://schemas.microsoft.com/winfx/2006/xaml\" " +
"xmlns:nicefx=\"clr-namespace:NiceFx.Interop.UIComponents;assembly=NiceFx\" " +
"xmlns:wpfkit=\"http://schemas.microsoft.com/wpf/2008/toolkit\" >" +
" <DockPanel LastChildFill=\"True\">" +
"<TextBlock Foreground=\"White\" Background=\"Red\" FontSize=\"12\" Padding=\"2\" FontFamily=\"Trebuchet MS\" Margin=\"5,5,0,0\" TextWrapping=\"Wrap\" DockPanel.Dock=\"Bottom\" Text=\"" + errorMessage + "\"></TextBlock>" +
"<AdornedElementPlaceholder />" +
" </DockPanel>" +
" </ControlTemplate>";
//EXCEPTION OCCURS IN THIS LINE
ControlTemplate ct = (ControlTemplate)XamlReader.Load(XmlReader.Create(
new StringReader(xamlString)));
return ct;
}
How do I escape this string? I tried all possible ways but I am unable to do so.
According to the comment in your code, errorMessage contains a ", which will be inserted (without escaping it) into the XAML you are constructing. This " will then act as the closing quote of the Text attribute. At this point, the next non-whitespace character the parser encounters will be #, which is not an allowed character for the name of a XAML attribute, so it stops and reports the error.
That covers the why. As for how to escape it, you can use the XML entity for double quote: " Note that you may need to apply this escaping to multiple characters in your parameter.

Resources