Flex: REJECT rejects one character at a time? - c

I'm parsing C++-style scoped names, e.g., A::B. I want to parse such a name as a single token for a type name if it was previously declared as a type; otherwise, I want to parse it as three tokens A, ::, and B.
Given:
L [A-Za-z_]
D [0-9]
S [ \f\r\t\v]
identifier {L}({L}|{D})*
sname {identifier}({S}*::{S}*{identifier})+
%%
{sname} {
if ( find_type( yytext ) )
return Y_TYPE_NAME;
REJECT;
}
{identifier} {
// ...
return Y_NAME;
}
However, for the input sequence:
A::BB -> Not a type; returned as "A", "::", "BB"
(Declare A::B as a type.)
A::BB
What happens on the second parse of A::BB, REJECT is called and flex discards only 1 character from the end and tries to match A::B (one B). This matches the previously declared A::B in the {sname} rule which is wrong.
What I assumed REJECT did was to proceed to the second-best matching rule with the same input. Hence, I expected it to match A alone in {identifier} and just leave ::BB on the input stream. But, as shown, that's not what happens. It peels off one character at a time from the end of the input and re-attempts a match.
Adding in yyless() to chop off the ::BB doesn't help:
{sname} {
if ( find_type( yytext ) )
return Y_TYPE_NAME;
char const *const sep = strpbrk( yytext, ": \t" );
size_t keep_len = (size_t)(sep - yytext);
yyless( keep_len );
REJECT;
}
The only thing that I've found that works is:
{sname} {
if ( find_type( yytext ) )
return Y_TYPE_NAME;
char const *const sep = strpbrk( yytext, ": \t" );
size_t keep_len = (size_t)(sep - yytext);
yyless( keep_len );
goto identifier;
}
{identifier} {
identifier:
// ...
return Y_NAME;
}
But these seems kind of hack-ish. Is there a more canonical "flex way" to do what I want?
Despite the hack-ish-ness, is there actually anything wrong with my solution? I.e., would it not work in some cases? Or not work in some versions of Flex?
Yes, I'm aware that, even if it did work the way I want, this won't parse all contrived C++ scoped names; but it's good enough.
Yes, I'm aware that generally parsing such things should be done as separate tokens, but C-like languages are hard to parse since they're not context-free. Knowing when you have a type really helps.
Yes, I'm aware that REJECT slows down the lexer. This is (mostly) for an interactive and command-line tool in a terminal, so the human is the slowest component.
I'd like to focus on the problem at hand with the code as it mostly is. Mine is more of a question about how to use REJECT, yyless(), etc., to get the desired behavior. Thanks.

REJECT does not make any distinction between different rules; it just falls back to the next possible accepting pattern (which might not even be shorter, if there's a lower-precedence rule which matches the same token.) That might be a shorter match of the same pattern. (Normally, Flex chooses the longest match out of the possible matches of the regular expression. With REJECT, the shorter matches are also considered.)
So you can avoid the false match of A::B for input A::BB by using trailing context: [Note 1]
{sname}/[^[:alnum:]_] {...}
(In Flex, you can't put trailing context in a macro because the expansion is surrounded with parentheses.)
You could use that solution if you wanted to try all possible complete prefixes of id1::id2::id3::id4 starting with the longest one (id1::id2::id3::id4) and falling back to each shorter prefix in turn (id1::id2::id3, id1::id2, id1). The trailing context prevents intermediate matches in the middle of an identifier.
I'm not sure if that is the problem you are trying to solve, though. And even if it is, it doesn't really solve the problem of what to do after you fall back to an acceptable solution, because you probably don't want to redo the search for prefixes in the middle of the sequence. In other words, if the original were A::B::A::B, and A::B is a "known prefix", the tokens returned should (presumably) be A::B, ::, A, ::, B rather than A::B, ::, A::B.
On the other hand, you might not care about previously defined prefixes; only whether the complete sequence is a known type. In that case, the possible token sequences for A::B::C::D are [TYPENAME(A::B::C::D)] and [ID(A),, ::, ID(B), ::, ID(C), ID(D)]. For this case, you don't want to restrict the {sname} fallbacks; you want to eliminate them. After the fallback, you only want to try matches for different rules.
There are a variety of alternative solutions (which do not use REJECT), mostly relying on the possibility of using start conditions. (You still need to think about what happens after the fallback. Start conditions can be useful for that, as well.)
One fairly general solution is to define a start condition which excludes the rule you want to fall back from. Once you've determined that you want to fall back to a different rule, you change to a start condition which excludes the rule which just matched and then call yyless(0) (without returning). That will cause Flex to rescan the current token from the beginning in the new start condition. (If you have several possible rules which might match, you will need several start conditions. This could get out of hand, but in most cases the possible set of matching rules is very limited.)
At some point, you need to return to the previous start condition. (You could use a start condition stack if it's not trivial to figure out which was the previous start condition.) Ideally, to avoid false interior matches, you would want to return to the previous start condition only when you reached the end of the first (incorrect) match. That's easy to do if you are tracking the input position for each token; you just need to save the position just before calling yyless(0). (You also need to correct the token input position by subtracting yyleng before yyless sets yyleng to 0).
Rescanning from the beginning of the token might seem inefficient, but it's less inefficient than the overhead imposed by REJECT. And the overhead of REJECT affects the entire scanner operation, while the rescanning solution is essentially free except for the tokens you happen to rescan. [Note 2]
(BEGIN(some_state); yyless(0);) is a reasonably common flex idiom; this is not the only use case. Another one is the answer to the question "How do I run a start condition until I reach a token I can't identify without consuming that token?")
But I think that in this case there is a simpler solution, using yymore to accumulate the token. (This avoids having to do your own dynamically expanded token buffer.) Again, there are two possibilities, depending on whether you might allow initial prefix matches or restrict the possibilities to either the full sequence or the first identifier.
But the outline is the same: Match the shortest valid possibility first, remember how long the token was at that point, and then use a start condition to enable a pattern which extends the token, either to the next possibility or to the end of the sequence, depending on your needs. Before continuing with the scanner loop, you call yymore() which indicates to Flex that the next pattern extends the token rather than replacing it.
At each possible match end, you test to see if that would really be a valid token and if so, record the position (and whatever else you might need to recall, such as the token type). When you reach a point where you can no longer extend the match, you use yyless to fall back to the last valid match point, and return the token.
This is slightly less inefficient than the pure yyless() solution because it avoids the rescan of the token which is returned. (All these solutions, including REJECT, do rescan the text following the selected match if it is shorter than the longest possible extended match. That's theoretically avoidable but since it's not a lot of overhead, it doesn't seem worthwhile to build a complex mechanism to avoid it.)
Again, you probably want to avoid trying to extend token matches after the fallback until you reach the longest extent. This can be solved the same way, by recording the longest matched extent, but the start condition handling is a bit simpler.
Here's some not-very-well tested code for the simpler problem, where only the first identifier and the full match are possible:
%{
/* Code to track current token position and the fallback position.
* It's a simple byte count; line-ends are not dealt with.
*/
static int token_pos = 0;
static int token_max_pos = 0;
/* This is done before every action, even empty actions */
#define YY_USER_ACTION token_pos += yyleng;
/* FALLBACK needs to undo YY_USER_ACTION */
#define FALLBACK(to) do { token_pos -= yyleng - to; yyless(to); } while (0)
/* SET_MORE needs to pre-undo the next YY_USER_ACTION */
#define SET_MORE(X) do { token_pos -= yyleng; yymore(); } while(0)
%}
%x EXTEND_ID
ident [[:alpha:]_][[:alnum:]_]*
%%
int fallback_leng = 0;
/* The trailing context here is to avoid triggering EOF in
* the EXTEND_ID state.
*/
{ident}/[^[:alnum:]_] {
/* In the fallback region, don't attempt to extend the match. */
if (token_pos <= token_max_pos)
return Y_IDENT;
fallback_leng = yyleng;
BEGIN(EXTEND_ID);
SET_MORE(yymore);
}
{ident} { return find_type(yytext) ? Y_TYPE_NAME : Y_IDENT; }
<EXTEND_ID>{
([[:space:]]*"::"[[:space:]]*{ident})*|.|\n {
BEGIN(INITIAL);
if (yyleng == fallback_leng + 1)
FALLBACK(fallback_leng);
if (find_type(yytext))
return Y_TYPE_NAME;
else {
FALLBACK(fallback_leng);
return Y_IDENT;
}
}
}
Notes
At least, I think you can do that. I haven't ever tried and REJECT does impose a number of limitations on other scanner features. Really, it's a historical artefact which massively slows down lexical analysis and should generally be avoided.
The use of REJECT anywhere in the rules causes Flex to switch to a different template in which a list of endpoints is retained, which has a cost in both time and space. Another consequence is that Flex can no longer resize the input buffer.

Related

How can I ignore an OS Code 5 Error in Rust? [duplicate]

I noticed that Rust does not have exceptions. How to do error handling in Rust and what are the common pitfalls? Are there ways to control flow with raise, catch, reraise and other stuff? I found inconsistent information on this.
Rust generally solves errors in two ways:
Unrecoverable errors. Once you panic!, that's it. Your program or thread aborts because it encounters something it can't solve and its invariants have been violated. E.g. if you find invalid sequences in what should be a UTF-8 string.
Recoverable errors. Also called failures in some documentation. Instead of panicking, you emit a Option<T> or Result<T, E>. In these cases, you have a choice between a valid value Some(T)/Ok(T) respectively or an invalid value None/Error(E). Generally None serves as a null replacement, showing that the value is missing.
Now comes the hard part. Application.
Unwrap
Sometimes dealing with an Option is a pain in the neck, and you are almost guaranteed to get a value and not an error.
In those cases it's perfectly fine to use unwrap. unwrap turns Some(e) and Ok(e) into e, otherwise it panics. Unwrap is a tool to turn your recoverable errors into unrecoverable.
if x.is_some() {
y = x.unwrap(); // perfectly safe, you just checked x is Some
}
Inside the if-block it's perfectly fine to unwrap since it should never panic because we've already checked that it is Some with x.is_some().
If you're writing a library, using unwrap is discouraged because when it panics the user cannot handle the error. Additionally, a future update may change the invariant. Imagine if the example above had if x.is_some() || always_return_true(). The invariant would changed, and unwrap could panic.
? operator / try! macro
What's the ? operator or the try! macro? A short explanation is that it either returns the value inside an Ok() or prematurely returns error.
Here is a simplified definition of what the operator or macro expand to:
macro_rules! try {
($e:expr) => (match $e {
Ok(val) => val,
Err(err) => return Err(err),
});
}
If you use it like this:
let x = File::create("my_file.txt")?;
let x = try!(File::create("my_file.txt"));
It will convert it into this:
let x = match File::create("my_file.txt") {
Ok(val) => val,
Err(err) => return Err(err),
};
The downside is that your functions now return Result.
Combinators
Option and Result have some convenience methods that allow chaining and dealing with errors in an understandable manner. Methods like and, and_then, or, or_else, ok_or, map_err, etc.
For example, you could have a default value in case your value is botched.
let x: Option<i32> = None;
let guaranteed_value = x.or(Some(3)); //it's Some(3)
Or if you want to turn your Option into a Result.
let x = Some("foo");
assert_eq!(x.ok_or("No value found"), Ok("foo"));
let x: Option<&str> = None;
assert_eq!(x.ok_or("No value found"), Err("No value found"));
This is just a brief skim of things you can do. For more explanation, check out:
http://blog.burntsushi.net/rust-error-handling/
https://doc.rust-lang.org/book/ch09-00-error-handling.html
http://lucumr.pocoo.org/2014/10/16/on-error-handling/
If you need to terminate some independent execution unit (a web request, a video frame processing, a GUI event, a source file to compile) but not all your application in completeness, there is a function std::panic::catch_unwind that invokes a closure, capturing the cause of an unwinding panic if one occurs.
let result = panic::catch_unwind(|| {
panic!("oh no!");
});
assert!(result.is_err());
I would not grant this closure write access to any variables that could outlive it, or any other otherwise global state.
The documentation also says the function also may not be able to catch some kinds of panic.

Flex does not recognize identifiers

I am trying to implement a very simple parser using flex. I am currently stuck in the ID recognition. That is my code:
ID [a−zA−Z_][a−zA−Z0−9_]*
...
{ID} { printf( "An identifier: %s\n", yytext ); return TOK_ID;}
However what I get is only the first letter of the identifier, for example if I try to parse:
int _underscore ;
The result is:
An identifier: _
Any advice?
EDIT:
With a more accurate analysis I have figured out that the code is able to recognize only the id with a,z,A,Z,_, that are the explicit characters in the regular expression. I did not find anything like that online, is that a bug?
EDIT2:
If I modify the code in that way all work
ID [a−zA−Z_][a−zA−Z0−9_]*
...
[a−zA−Z_][a−zA−Z0−9_]* { printf( "An identifier: %s\n", yytext ); return TOK_ID;}
According to the documentation it should work also in the other way.
This is a character encoding issue. In your copy-and-pasted source code, the things that look like ASCII hyphens (-, code U+2D) in your definition of ID:
ID [a−zA−Z_][a−zA−Z0−9_]*
aren't. Instead they're unicode minus signs (−, U+2212). If you replace the incorrect minus signs with the correct hyphens, the line will look like:
ID [a-zA-Z_][a-zA-Z0-9_]*
Depending on your font, if you look very closely, you may see a difference between the − in the first version and the - in the second.
Anyway, replace your ID definition with the second version above (or else retype it from scratch, and all should be well.

customizing completion of GtkComboBoxText

How can I customize the completion of a GtkComboBoxText with both a "static" aspect and a "dynamic" one? The static aspect is because some entries are known and added to the combo-box-text at construction time with gtk_combo_box_text_append_text. The dynamic aspect is because I also need to complete thru some callback function(s), that is to complete dynamically -after creation of the GtkComboBoxText widget- once several characters has been typed.
My application uses Boehm's GC (except for GTK objects of course) like Guile or SCM or Bigloo are doing. It can be seen as an experimental persistent dynamic-typed programming language implementation with an integrated editor coded on and for Debian/Linux/x86-64 with the system GTK3.21 library, it is coded in C99 (some of which is generated) and is compiled with GCC6.
(I don't care about non-Linux systems, GTK3 libraries older than GTK3.20, GCC compiler older that GCC6)
question details
I'm entering (inputting into the GtkComboBoxText) either a name, or an object-id.
The name is C-identifier-like but starts with a letter and cannot end with an underscore. For example, comment, if, the_GUI, the_system, payload_json, or x1 are valid names (but _a0bcd or foobar_ are invalid names, because they start or end with an underscore). I currently have a big dozen of names, but I could have a few thousands of them. So it would be reasonable to offer a completion once only a single or perhaps two letters has been typed, and completion for names can happen statically because they are not many of them (so I feel reasonable to call gtk_combo_box_append_text for each name).
The object-id starts with an underscore followed by a digit and has exactly 18 alphanumeric (sort-of random) characters. For example, _5Hf0fFKvRVa71ZPM0, _8261sbF1f9ohzu2Iu, _0BV96V94PJIn9si1K are possible object-ids. Actually it is 96 almost random bits (probably only 294 are possible). The object-id plays the role of UUIDs (in the sense that it is assumed to be world-wide unique for distinct objects) but has a C friendly syntax. I currently have a few dozen of objects-ids, but I could have a few hundred of thousands (or maybe a million) of them. But given a prefix of four characters like _6S3 or _22z, I am assuming that only a reasonable number (probably at most a dozen, and surely no more than a thousand) object-ids exist in my application with that prefix. Of course it would be unreasonable to register (statically) a priori all the object ids (the completion has to happen after four characters have been typed, and should happen dynamically).
So I want a completion that works both on names (e.g. typing one letter perhaps followed by another alphanum character should be enough to propose a completion of at most a hundred choices), and on object-ids (typing four characters like _826 should be enough to trigger a completion of probably at most a few dozen choices, perhaps a thousand ones if unlucky).
Hence typing the three keys p a tab would offer completion with a few names like payload_json or payload_vectval etc... and typing the five keys _ 5 H f tab would offer completion with very few object-ids, notably _5Hf0fFKvRVa71ZPM0
sample incomplete code
So far I coded the following:
static GtkWidget *
mom_objectentry (void)
{
GtkWidget *obent = gtk_combo_box_text_new_with_entry ();
gtk_widget_set_size_request (obent, 30, 10);
mo_value_t namsetv = mo_named_objects_set ();
I have Boehm-garbage-collected values, and mo_value_t is a pointer to any of them. Values can be tagged integers, pointers to strings, objects, or tuples or sets of objects. So namesetv now contains the set of named objects (probably less than a few thousand of named objects).
int nbnam = mo_set_size (namsetv);
MOM_ASSERTPRINTF (nbnam > 0, "bad nbnam");
mo_value_t *namarr = mom_gc_alloc (nbnam * sizeof (mo_value_t));
int cntnam = 0;
for (int ix = 0; ix < nbnam; ix++)
{
mo_objref_t curobr = mo_set_nth (namsetv, ix);
mo_value_t curnamv = mo_objref_namev (curobr);
if (mo_dyncast_string (curnamv))
namarr[cntnam++] = curnamv;
}
qsort (namarr, cntnam, sizeof (mo_value_t), mom_obname_cmp);
for (int ix = 0; ix < cntnam; ix++)
gtk_combo_box_text_append_text (GTK_COMBO_BOX_TEXT (obent),
mo_string_cstr (namarr[ix]));
at this point I have sorted all the (few thousands at most) names and added "statically" them using gtk_combo_box_text_append_text.
GtkWidget *combtextent = gtk_bin_get_child (GTK_BIN (obent));
MOM_ASSERTPRINTF (GTK_IS_ENTRY (combtextent), "bad combtextent");
MOM_ASSERTPRINTF (gtk_entry_get_completion (GTK_ENTRY (combtextent)) ==
NULL, "got completion in combtextent");
I noticed with a bit of surprise that gtk_entry_get_completion (GTK_ENTRY (combtextent)) is null.
But I am stuck here. I am thinking of:
Having some mom_set_complete_objectid(const char*prefix) which given a prefix like "_47n" of at least four characters would return a garbage collected mo_value_t representing the set of objects with that prefix. This is very easy to code for me, and is nearly done.
Make my own local GtkEntryCompletion* mycompl = ..., which would complete like I want. Then I would put it in the text entry combtextent of my gtk-combo-box-text using gtk_entry_set_completion(GTK_ENTRY(combtextent), mycompl);
Should it use the entries added with gtk_combo_box_text_append_text for the "static" name completion role? How should I dynamically complete using the dynamic set value returned from my mom_set_complete_objectid; given some object-pointer obr and some char bufid[20]; I am easily and quickly able to fill it with the object-id of that object obr with mo_cstring_from_hi_lo_ids(bufid, obr->mo_ob_hid, obr->mo_ob_loid)..
I don't know how to code the above. For reference, I am now just returning the combo-box-text:
// if the entered text starts with a letter, I want it to be
// completed with the appended text above if the entered text starts
// with an undersore, then a digit, then two alphanum (like _0BV or
// _6S3 for example), I want to call a completion function.
#warning objectentry: what should I code here?
return obent;
} /* end mom_objectentry */
Is my approach the right one?
The mom_objectentry function above is used to fill modal dialogs with short lifetime.
I am favoring simple code over efficiency. Actually, my code is temporary (I'm hoping to bootstrap my language, and generate all its C code!) and in practice I'll probably have only a few hundred names and at most a few dozen of thousands of object-ids. So performance is not very important, but simplicity of coding (some conceptually "throw away" code) is more important.
I don't want (if possible) to add my own GTK classes. I prefer using existing GTK classes and widgets, customizing them with GTK signals and callbacks.
context
My application is an experimental persistent programming language and implementation with a near Scheme or Python (or JavaScript, ignoring the prototype aspect, ...) semantics but with a widely different (not yet implemented in september 7th, 2016) syntax (to be shown & input in GTK widgets), using the Boehm garbage collector for values (including objects, sets, tuples, strings...)... Values (including objects) are generally persistent (except the GTK related data : the application starts with a nearly empty window). The entire language heap is persisted in JSON-like syntax in some Sqlite "database" (generated at application exit) dumped into _momstate.sql which is re-loaded at application startup. Object-ids are useful to show object references to the user in GTK widgets, for persistence, and to generate C code related to the objects (e.g. the object of id _76f7e2VcL8IJC1hq6 could be related to a mo_76f7e2VcL8IJC1hq6 identifier in some generated C code; this is partly why I have my object-id format instead of using UUIDs).
PS. My C code is GPLv3 free software and available on github. It is the MELT monitor, branch expjs, commit e2b3b99ef66394...
NB: The objects mentioned here are implicitly my language objects, not GTK objects. The all have a unique object-id, and some but not most of them are named.
I will not show exact code on how to do it because I never did GTK & C only GTK & Python, but it should be fine as the functions in C and Python functions can easily be translated.
OP's approach is actually the right one, so I will try to fill in the gaps. As the amount of static options is limited probably won't change to much it indeed makes sense to add them using gtk_combo_box_text_append which will add them to the internal model of the GtkComboBoxText.
Thats covers the static part, for the dynamic part it would be perfect if we could just store this static model and replace it with a temporay model using gtk_combo_box_set_model() when a _ was found at the start of the string. But we shouldn't do this as the documentation says:
You should not call gtk_combo_box_set_model() or attempt to pack more cells into this combo box via its GtkCellLayout interface.
So we need to work around this, one way of doing this is by adding a GtkEntryCompletion to the entry of the GtkComboBoxText. This will make the entry attempt to complete the current string based on its current model. As an added bonus it can also add all the character all options have in common like this:
As we don't want to load all the dynamic options before hand I think the best approach will be to connect a changed listener to the GtkEntry, this way we can load the dynamic options when we have a underscore and some characters.
As the GtkEntryCompletion uses a GtkListStore internally, we can reuse part of the code Nominal Animal provided in his answer. The main difference being: the connect is done on the GtkEntry and the replacing of GtkComboText with GtkEntryCompletion inside the populator. Then everything should be fine, I wish I would be able to write decent C then I would have provided you with code but this will have to do.
Edit: A small demo in Python with GTK3
import gi
gi.require_version('Gtk', '3.0')
import gi.repository.Gtk as Gtk
class CompletingComboBoxText(Gtk.ComboBoxText):
def __init__(self, static_options, populator, **kwargs):
# Set up the ComboBox with the Entry
Gtk.ComboBoxText.__init__(self, has_entry=True, **kwargs)
# Store the populator reference in the object
self.populator = populator
# Create the completion
completion = Gtk.EntryCompletion(inline_completion=True)
# Specify that we want to use the first col of the model for completion
completion.set_text_column(0)
completion.set_minimum_key_length(2)
# Set the completion model to the combobox model such that we can also autocomplete these options
self.static_options_model = self.get_model()
completion.set_model(self.static_options_model)
# The child of the combobox is the entry if 'has_entry' was set to True
entry = self.get_child()
entry.set_completion(completion)
# Set the active option of the combobox to 0 (which is an empty field)
self.set_active(0)
# Fill the model with the static options (could also be used for a history or something)
for option in static_options:
self.append_text(option)
# Connect a listener to adjust the model when the user types something
entry.connect("changed", self.update_completion, True)
def update_completion(self, entry, editable):
# Get the current content of the entry
text = entry.get_text()
# Get the completion which needs to be updated
completion = entry.get_completion()
if text.startswith("_") and len(text) >= completion.get_minimum_key_length():
# Fetch the options from the populator for a given text
completion_options = self.populator(text)
# Create a temporary model for the completion and fill it
dynamic_model = Gtk.ListStore.new([str])
for completion_option in completion_options:
dynamic_model.append([completion_option])
completion.set_model(dynamic_model)
else:
# Restore the default static options
completion.set_model(self.static_options_model)
def demo():
# Create the window
window = Gtk.Window()
# Add some static options
fake_static_options = [
"comment",
"if",
"the_GUI",
"the_system",
"payload_json",
"x1",
"payload_json",
"payload_vectval"
]
# Add the the Combobox
ccb = CompletingComboBoxText(fake_static_options, dynamic_option_populator)
window.add(ccb)
# Show it
window.show_all()
Gtk.main()
def dynamic_option_populator(text):
# Some fake returns for the populator
fake_dynamic_options = [
"_5Hf0fFKvRVa71ZPM0",
"_8261sbF1f9ohzu2Iu",
"_0BV96V94PJIn9si1K",
"_0BV1sbF1f9ohzu2Iu",
"_0BV0fFKvRVa71ZPM0",
"_0Hf0fF4PJIn9si1Ks",
"_6KvRVa71JIn9si1Kw",
"_5HKvRVa71Va71ZPM0",
"_8261sbF1KvRVa71ZP",
"_0BKvRVa71JIn9si1K",
"_0BV1KvRVa71ZPu2Iu",
"_0BV0fKvRVa71ZZPM0",
"_0Hf0fF4PJIbF1f9oh",
"_61sbFV0fFKn9si1Kw",
"_5Hf0fFKvRVa71ozu2",
]
# Only return those that start with the text
return [fake_dynamic_option for fake_dynamic_option in fake_dynamic_options if fake_dynamic_option.startswith(text)]
if __name__ == '__main__':
demo()
Gtk.main()
Here is my suggestion:
Use a GtkListStore to contain a list of GTK-managed strings (essentially, copies of your identifier string) that match the current prefix string.
(As documented for gtk_list_store_set(), a G_TYPE_STRING item is copied. I consider the overhead of the extra copy acceptable here; it should not affect real-world performance much anyway, I think, and in return, GTK+ will manage the reference counting for us.)
The above is implemented in a GTK+ callback function, which gets an extra pointer as payload (set at the time the GUI is created or activated; I suggest you use some structure to keep references you need to generate the matches). The callback is connected to the combobox popup signal, so that it gets called whenever the list is expanded.
Note that as B8vrede noted in a comment, a GtkComboBoxText should not be modified via its model; that is why one should/must use a GtkComboBox instead.
Practical example
For simplicity, let's assume all the data you need to find or generate all known identifiers matched against is held in a structure, say
struct generator {
/* Whatever data you need to generate prefix matches */
};
and the combo box populator helper function is then something like
static void combo_box_populator(GtkComboBox *combobox, gpointer genptr)
{
struct generator *const generator = genptr;
GtkListStore *combo_list = GTK_LIST_STORE(gtk_combo_box_get_model(combobox));
GtkWidget *entry = gtk_bin_get_child(GTK_BIN(combobox));
const char *prefix = gtk_entry_get_text(GTK_ENTRY(entry));
const size_t prefix_len = (prefix) ? strlen(prefix) : 0;
GtkTreeIter iterator;
/* Clear the current store */
gtk_list_store_clear(combo_list);
/* Initialize the list iterator */
gtk_tree_model_get_iter_first(GTK_TREE_MODEL(combo_list), &iterator);
/* Find all you want to have in the combo box;
for each const char *match, do:
*/
gtk_list_store_append(combo_list, &iterator);
gtk_list_store_set(combo_list, &iterator, 0, match, -1);
/* Note that the string pointed to by match is copied;
match is not referred to after the _set() returns.
*/
}
When the UI is built or activated, you need to ensure the GtkComboBox has an entry (so the user can write text into it), and a GtkListStore model:
struct generator *generator;
GtkWidget *combobox;
GtkListStore *combo_list;
combo_list = gtk_list_store_new(1, G_TYPE_STRING);
combobox = gtk_combo_box_new_with_model_and_entry(GTK_TREE_MODEL(combo_list));
gtk_combo_box_set_id_column(GTK_COMBO_BOX(combobox), 0);
gtk_combo_box_set_entry_text_column(GTK_COMBO_BOX(combobox), 0);
gtk_combo_box_set_button_sensitivity(GTK_COMBO_BOX(combobox), GTK_SENSITIVITY_ON);
g_signal_connect(combobox, "popup", G_CALLBACK(combo_box_populator), generator);
On my system, the default pop-up accelerator is Alt+Down, but I assume you've already changed that to Tab.
I have a crude working example here (a .tar.xz tarball, CC0): it reads lines from standard input, and lists the ones matching the user prefix in reverse order in the combo box list (when popped-up). If the entry is empty, the combobox will contain all input lines. I didn't change the default accelerators, so instead of Tab, try Alt+Down.
I also have the same example, but using GtkComboBoxText instead, here (also CC0). This does not use a GtkListStore model, but uses gtk_combo_box_text_remove_all() and gtk_combo_box_text_append_text() functions to manipulate the list contents directly. (There is just a few different lines in the two examples.) Unfortunately, the documentation is not explicit whether this interface references or copies the strings. Although copying is the only option that makes sense, and this can be verified from the current Gtk+ sources, the lack of explicit documentation makes me hesitant.
Comparing the two examples I linked to above (both grab some 500 random words from /usr/share/dict/words if you compile and run it with make), I don't see any speed difference. Both use the same naïve way of picking prefix matches from a linked list, which means the two methods (GtkComboBox + model, or GtkComboBoxText) should be about equally fast.
On my own machine, both get annoyingly slow with more than 1000 or so matches in the popup; with just a hundred or less matches, it feels instantaneous. This, to me, indicates that the slow/naïve way of picking prefix matches from a linked list is not the culprit (because the entire list is traversed in both cases), but that the GTK+ combo boxes are just not designed for large lists. (The slowdown is definitely much, much worse than linear.)

How to convert ASCII integers to string in Streambase?

I am using streambase, and have created a map to convert incoming integers (0-255) into character codes. Right now, I'm using manual if/else statements. For example:
if (message_code == '106') then
message_code = 'J'
else if (message_code == '110') then
message_code= 'N'
However, I'd like to just use this more generally. Searching Streambase Studio help didn't really provide anything as far as I could tell. I know this can be done in Java, but would probably require calling a java external function. I'm not so competent at this, so am a bit lost.
Is there a better way to do this in StreamBase?
Initial investigations confirm that the StreamBase Expression Language or built-in Functions don't have an obvious way to convert ints representing ASCII character values into a one-character string corresponding to the ASCII value. If there's some non-obvious trick I discover later, I'll come back and edit this answer.
I definitely wouldn't use a 255-clause if/then/else expression!
I would probably use a Java function and here is an example I've lightly tested:
package example;
public class ASCIIIntToString {
public static String ASCIIToString(Integer a) {
if (a == null || a < 0 || a > 255) {
return null;
} else {
return Character.toString((char) (int) a);
}
}
}
And the custom-function declaration to drop into your sbd.sbconf would look like this:
<custom-functions>
<custom-function alias="ASCIIToString"
args="auto"
class="example.ASCIIIntToString"
language="java"
name="ASCIIToString"
type="simple"/>
</custom-functions>
But if you would rather not drop into Java and stay in EventFlow then two methods come to mind:
Load a CSV file with int -> string mappings in them into a Query Table and dto a lookup with Query Read operator, perhaps using the Initial Contents tab to reference the CSV file since ASCII isn't going to change anytime soon.
Create a list constant called, say, ASCII, that has all the strings you want to convert to and index the list by the int. For example, in a Map operator do Add c ASCII[i], and add some bounds checking logic (perhaps in a user function).
Disclosure/Disclaimer: I am an employee of TIBCO Software, Inc. Opinions expressed here are my own and not TIBCO's.

Recovering error tokens in parsing (Lemon)

I'm using Lemon as a parser generator, its error handling is the same as yacc's and bison's if you don't know Lemon.
Lemon has an option to define the error token in a set of rules in order to catch parsing errors. The default behavior of the generated parser is to destroy the token causing the error; is there any way to override this behavior so that I can keep the token?
Here's an example to show what's happening: basically I'm appending the tokens for each rule together to reform the input string, here's an example grammar:
input ::= string(A) { printf("%s", A); } // Print the result
string(A) ::= string(B) part(C). { A = append(B, C); }
string(A) ::= part(B). { A = B; }
part(A) ::= NUMBER(B) NAME(C). { A = append(C, B); } // Rearrange the number and name
part(A) ::= error(B). { A = B; } // On error keep the token anyways
On input:
"Username 1234Joseph"
I get output:
"Joseph1234"
Because the text "Username " is junked by the parser in the part(A) ::= error(B) rule, but I really want:
"Username Joseph1234"
as output.
If you can solve this problem in bison or another parser generator I would accept that as an answer :)
With yacc/bison, a parsing error drops the tool into error recovery mode, if possible. It will attempt to discard tokens on its way to a "clean" state.
I'm unable to find a reference for lemon, so I can't show some lemon code to fix this, but with yacc/bison, one would use the rules here.
Namely, you need to adjust your error rule to state that the parser is ok with yyerrok to prevent it from dropping tokens. Next, it will attempt to reread the "bad" token, so you need to clear it with yyclearin. Finally, since the rule attached to your error code contains the contents of your token, you will need to set up a function that adjusts your input stack, by taking the current token contents and creating a new (proper) token with the same contents.
As an example, if a grammar defined as MyOther MyOther saw MyTok MyOther:
stack
MyTok: "the text"
MyOther: "new text"
stack
MyOther: "the text"
MyOther: "new text"
To accomplish this, look into using yybackup. I'm unable to find an alternative method, though yybackup is frowned upon.
It's an old one, but why not...
The grammar must include spaces. At the moment the grammar only allows a sequence of NUMBER NAME tokens (without any space between the tokens).

Resources