Query on TFP Probabilistic Model - tensorflow-probability

In the TFP tutorial, the model output is Normal distribution. I noted that the output can be replaced by an IndependentNormal layer. In my model, the y_true is binary class. Therefore, I used an IndependentBernoulli layer instead of IndependentNormal layer.
After building the model, I found that it has two output parameters. It doesn't make sense to me since Bernoulli distribution has one parameter only. Do you know what went wrong?
# Define the prior weight distribution as Normal of mean=0 and stddev=1.
# Note that, in this example, the we prior distribution is not trainable,
# as we fix its parameters.
def prior(kernel_size, bias_size, dtype=None):
n = kernel_size + bias_size
prior_model = Sequential([
tfpl.DistributionLambda(
lambda t: tfd.MultivariateNormalDiag(loc=tf.zeros(n), scale_diag=tf.ones(n))
)
])
return prior_model
# Define variational posterior weight distribution as multivariate Gaussian.
# Note that the learnable parameters for this distribution are the means,
# variances, and covariances.
def posterior(kernel_size, bias_size, dtype=None):
n = kernel_size + bias_size
posterior_model = Sequential([
tfpl.VariableLayer(tfpl.MultivariateNormalTriL.params_size(n), dtype=dtype),
tfpl.MultivariateNormalTriL(n)
])
return posterior_model
# Create a probabilistic DL model
model = Sequential([
tfpl.DenseVariational(units=16,
input_shape=(6,),
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/X_train.shape[0],
activation='relu'),
tfpl.DenseVariational(units=16,
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/X_train.shape[0],
activation='sigmoid'),
tfpl.DenseVariational(units=tfpl.IndependentBernoulli.params_size(1),
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/X_train.shape[0]),
tfpl.IndependentBernoulli(1, convert_to_tensor_fn=tfd.Bernoulli.logits)
])
model.summary()
screenshot of the results executed the codes on Google Colab

I agree the summary display is confusing but I think this is an artifact of the way tfp layers are implemented to interact with keras. During normal operation, there will only be one return value from a DistributionLambda layer. But in some contexts (that I don't fully grok) DistributionLambda.call may return both a distribution and a side-result. I think the summary plumbing triggers this for some reason, so it looks like there are 2 outputs, but there will practically only be one. Try calling your model object on X_train, and you'll see you get a single distribution out (its type is actually something called TensorCoercible, which is a wrapper around a distribution that lets you pass it into tf ops that call tf.convert_to_tensor -- the resulting value for that op will be the result of calling your convert_to_tensor_fn on the enclosed distribution).
In summary, your distribution layer is fine but the summary is confusing. It could probably be fixed; I'm not keras-knowledgeable enough to opine on how hard it would be.
Side note: you can omit the event_shape=1 parameter -- the default value is (), or "scalar", which will behave the same.
HTH!

Related

How do I get a dataframe or database write from TFX BulkInferrer?

I'm very new to TFX, but have an apparently-working ML Pipeline which is to be used via BulkInferrer. That seems to produce output exclusively in Protobuf format, but since I'm running bulk inference I want to pipe the results to a database instead. (DB output seems like it should be the default for bulk inference, since both Bulk Inference & DB access take advantage of parallelization... but Protobuf is a per-record, serialized format.)
I assume I could use something like Parquet-Avro-Protobuf to do the conversion (though that's in Java and the rest of the pipeline's in Python), or I could write something myself to consume all the protobuf messages one-by-one, convert them into JSON, deserialize the JSON into a list of dicts, and load the dict into a Pandas DataFrame, or store it as a bunch of key-value pairs which I treat like a single-use DB... but that sounds like a lot of work and pain involving parallelization and optimization for a very common use case. The top-level Protobuf message definition is Tensorflow's PredictionLog.
This must be a common use case, because TensorFlowModelAnalytics functions like this one consume Pandas DataFrames. I'd rather be able to write directly to a DB (preferably Google BigQuery), or a Parquet file (since Parquet / Spark seems to parallelize better than Pandas), and again, those seem like they should be common use cases, but I haven't found any examples. Maybe I'm using the wrong search terms?
I also looked at the PredictExtractor, since "extracting predictions" sounds close to what I want... but the official documentation appears silent on how that class is supposed to be used. I thought TFTransformOutput sounded like a promising verb, but instead it's a noun.
I'm clearly missing something fundamental here. Is there a reason no one wants to store BulkInferrer results in a database? Is there a configuration option that allows me to write the results to a DB? Maybe I want to add a ParquetIO or BigQueryIO instance to the TFX pipeline? (TFX docs say it uses Beam "under the hood" but that doesn't say much about how I should use them together.) But the syntax in those documents looks sufficiently different from my TFX code that I'm not sure if they're compatible?
Help?
(Copied from the related issue for greater visibility)
After some digging, here is an alternative approach, which assumes no knowledge of the feature_spec before-hand. Do the following:
Set the BulkInferrer to write to output_examples rather than inference_result by adding a output_example_spec to the component construction.
Add a StatisticsGen and a SchemaGen component in the main pipeline right after the BulkInferrer to generate a schema for the aforementioned output_examples
Use the artifacts from SchemaGen and BulkInferrer to read the TFRecords and do whatever is neccessary.
bulk_inferrer = BulkInferrer(
....
output_example_spec=bulk_inferrer_pb2.OutputExampleSpec(
output_columns_spec=[bulk_inferrer_pb2.OutputColumnsSpec(
predict_output=bulk_inferrer_pb2.PredictOutput(
output_columns=[bulk_inferrer_pb2.PredictOutputCol(
output_key='original_label_name',
output_column='output_label_column_name', )]))]
))
statistics = StatisticsGen(
examples=bulk_inferrer.outputs.output_examples
)
schema = SchemaGen(
statistics=statistics.outputs.output,
)
After that, one can do the following:
import tensorflow as tf
from tfx.utils import io_utils
from tensorflow_transform.tf_metadata import schema_utils
# read schema from SchemaGen
schema_path = '/path/to/schemagen/schema.pbtxt'
schema_proto = io_utils.SchemaReader().read(schema_path)
spec = schema_utils.schema_as_feature_spec(schema_proto).feature_spec
# read inferred results
data_files = ['/path/to/bulkinferrer/output_examples/examples/examples-00000-of-00001.gz']
dataset = tf.data.TFRecordDataset(data_files, compression_type='GZIP')
# parse dataset with spec
def parse(raw_record):
return tf.io.parse_example(raw_record, spec)
dataset = dataset.map(parse)
At this point, the dataset is like any other parsed dataset, so its trivial to write a CSV, or to a BigQuery table or whatever from there. It certainly helped us in ZenML with our BatchInferencePipeline.
Answering my own question here to document what we did, even though I think #Hamza Tahir's answer below is objectively better. This may provide an option for other situations where it's necessary to change the operation of an out-of-the-box TFX component. It's hacky though:
We copied and edited the file tfx/components/bulk_inferrer/executor.py, replacing this transform in the _run_model_inference() method's internal pipeline:
| 'WritePredictionLogs' >> beam.io.WriteToTFRecord(
os.path.join(inference_result.uri, _PREDICTION_LOGS_FILE_NAME),
file_name_suffix='.gz',
coder=beam.coders.ProtoCoder(prediction_log_pb2.PredictionLog)))
with this one:
| 'WritePredictionLogsBigquery' >> beam.io.WriteToBigQuery(
'our_project:namespace.TableName',
schema='SCHEMA_AUTODETECT',
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
custom_gcs_temp_location='gs://our-storage-bucket/tmp',
temp_file_format='NEWLINE_DELIMITED_JSON',
ignore_insert_ids=True,
)
(This works because when you import the BulkInferrer component, the per-node work gets farmed out to these executors running on the worker nodes, and TFX copies its own library onto those nodes. It doesn't copy everything from user-space libaries, though, which is why we couldn't just subclass BulkInferrer and import our custom version.)
We had to make sure the table at 'our_project:namespace.TableName' had a schema compatible with the model's output, but didn't have to translate that schema into JSON / AVRO.
In theory, my group would like to make a pull request with TFX built around this, but for now we're hard-coding a couple key parameters, and don't have the time to get this to a real public / production state.
I'm a little late to this party but this is some code I use for this task:
import tensorflow as tf
from tensorflow_serving.apis import prediction_log_pb2
import pandas as pd
def parse_prediction_logs(inference_filenames: List[Text]): -> pd.DataFrame
"""
Args:
inference files: tf.io.gfile.glob(Inferrer artifact uri)
Returns:
a dataframe of userids, predictions, and features
"""
def parse_log(pbuf):
# parse the protobuf
message = prediction_log_pb2.PredictionLog()
message.ParseFromString(pbuf)
# my model produces scores and classes and I extract the topK classes
predictions = [x.decode() for x in (message
.predict_log
.response
.outputs['output_2']
.string_val
)[:10]]
# here I parse the input tf.train.Example proto
inputs = tf.train.Example()
inputs.ParseFromString(message
.predict_log
.request
.inputs['input_1'].string_val[0]
)
# you can pull out individual features like this
uid = inputs.features.feature["userId"].bytes_list.value[0].decode()
feature1 = [
x.decode() for x in inputs.features.feature["feature1"].bytes_list.value
]
feature2 = [
x.decode() for x in inputs.features.feature["feature2"].bytes_list.value
]
return (uid, predictions, feature1, feature2)
return pd.DataFrame(
[parse_log(x) for x in
tf.data.TFRecordDataset(inference_filenames, compression_type="GZIP").as_numpy_iterator()
], columns = ["userId", "predictions", "feature1", "feature2"]
)

How do I implement a controlled Rx in Cirq/Tensorflow Quantum?

I am trying to implement a controlled rotation gate in Cirq/Tensorflow Quantum.
The readthedocs.io at https://cirq.readthedocs.io/en/stable/gates.html states:
"Gates can be converted to a controlled version by using Gate.controlled(). In general, this returns an instance of a ControlledGate. However, for certain special cases where the controlled version of the gate is also a known gate, this returns the instance of that gate. For instance, cirq.X.controlled() returns a cirq.CNOT gate. Operations have similar functionality Operation.controlled_by(), such as cirq.X(q0).controlled_by(q1)."
I have implemented
cirq.rx(theta_0).on(q[0]).controlled_by(q[3])
I get the following error:
~/.local/lib/python3.6/site-packages/cirq/google/serializable_gate_set.py in
serialize_op(self, op, msg, arg_function_language)
193 return proto_msg
194 raise ValueError('Cannot serialize op {!r} of type {}'.format(
--> 195 gate_op, gate_type))
196
197 def deserialize_dict(self,
ValueError: Cannot serialize op cirq.ControlledOperation(controls=(cirq.GridQubit(0, 3),), sub_operation=cirq.rx(sympy.Symbol('theta_0')).on(cirq.GridQubit(0, 0)), control_values=((1,),)) of type <class 'cirq.ops.controlled_gate.ControlledGate'>
I have the qubits and symbols initialized as:
q = cirq.GridQubit.rect(1, 4)
symbol_names = x_0, x_1, x_2, x_3, theta_0, theta_1, z_2, z_3
I do re-use the circuits with various circuits.
My question: How do I properly implement a controlled Rx in Cirq/Tensorflow Quantum?
P.S. I can't find a tag for Google Cirq
Follow up:
How does this generalize to the similar situations of Controlled Ry and controlled Rz?
For Rz I found a gate decomposition at https://threeplusone.com/pubs/on_gates.pdf, involving H.on(q1), CNOT(q0, q1), H.on(q2), but this is not yet an CRz with an arbitrary angle. Would I introduce the angle before the H?
For the Ry, I did not find a decomposition yet, neither the CRy.
What you have is a completely correct implementation of a controlled X rotation in Cirq. It can be used in simulation and other things like cirq.unitary without any issues.
TFQ only supports a subset of gates in Cirq. For example a cirq.ControlledGate can have an arbitrary number of control qubits, which in some cases can make it harder to decompose down to primitive gates that are compatible with NiSQ hardware platforms (This is why cirq.decompose doesn't do anything to ControlledOperations). TFQ only supports these primitive style gates , for a full list of the supported gates, you can do:
tfq.util.get_supported_gates().keys()
In your case it is possible to come up with a simpler implementation of this gate. First we can note that cirq.rx(some angle) is equal to cirq.X**(some angle / pi) offset by a global phase:
>>> a = cirq.rx(0.3)
>>> b = cirq.X**(0.3 / np.pi)
>>> cirq.equal_up_to_global_phase(cirq.unitary(a), cirq.unitary(b))
True
Lets move to using X now. Then the operation we are after is:
>>> qs = cirq.GridQubit.rect(1,2)
>>> a = (cirq.X**0.3)(qs[0]).controlled_by(qs[1])
>>> b = cirq.CNOT(qs[0], qs[1]) ** 0.3
>>> cirq.equal_up_to_global_phase(cirq.unitary(a), cirq.unitary(b))
True
Since cirq.CNOT is in the TFQ supported gates it should be serializable without any issues. If you want to make a symbolized version of the gate you can just replace the 0.3 with a sympy.Symbol.
Answer to follow up: If you want to do a CRz you can do the same thing you did above, swapping out the CNOT gate for the CZ gate. For CRy it's not as easy. For that I would recommend doing some combination of: cirq.Y(0) and cirq.YY(0, 1).
Edit: tfq-nightly builds and likely releases after 0.4.0 now include support for arbitrary controlled gates. So on these versions of tfq you could also do things like cirq.Y(...).controlled_by(...) to achieve the desired result now too.

customizing completion of GtkComboBoxText

How can I customize the completion of a GtkComboBoxText with both a "static" aspect and a "dynamic" one? The static aspect is because some entries are known and added to the combo-box-text at construction time with gtk_combo_box_text_append_text. The dynamic aspect is because I also need to complete thru some callback function(s), that is to complete dynamically -after creation of the GtkComboBoxText widget- once several characters has been typed.
My application uses Boehm's GC (except for GTK objects of course) like Guile or SCM or Bigloo are doing. It can be seen as an experimental persistent dynamic-typed programming language implementation with an integrated editor coded on and for Debian/Linux/x86-64 with the system GTK3.21 library, it is coded in C99 (some of which is generated) and is compiled with GCC6.
(I don't care about non-Linux systems, GTK3 libraries older than GTK3.20, GCC compiler older that GCC6)
question details
I'm entering (inputting into the GtkComboBoxText) either a name, or an object-id.
The name is C-identifier-like but starts with a letter and cannot end with an underscore. For example, comment, if, the_GUI, the_system, payload_json, or x1 are valid names (but _a0bcd or foobar_ are invalid names, because they start or end with an underscore). I currently have a big dozen of names, but I could have a few thousands of them. So it would be reasonable to offer a completion once only a single or perhaps two letters has been typed, and completion for names can happen statically because they are not many of them (so I feel reasonable to call gtk_combo_box_append_text for each name).
The object-id starts with an underscore followed by a digit and has exactly 18 alphanumeric (sort-of random) characters. For example, _5Hf0fFKvRVa71ZPM0, _8261sbF1f9ohzu2Iu, _0BV96V94PJIn9si1K are possible object-ids. Actually it is 96 almost random bits (probably only 294 are possible). The object-id plays the role of UUIDs (in the sense that it is assumed to be world-wide unique for distinct objects) but has a C friendly syntax. I currently have a few dozen of objects-ids, but I could have a few hundred of thousands (or maybe a million) of them. But given a prefix of four characters like _6S3 or _22z, I am assuming that only a reasonable number (probably at most a dozen, and surely no more than a thousand) object-ids exist in my application with that prefix. Of course it would be unreasonable to register (statically) a priori all the object ids (the completion has to happen after four characters have been typed, and should happen dynamically).
So I want a completion that works both on names (e.g. typing one letter perhaps followed by another alphanum character should be enough to propose a completion of at most a hundred choices), and on object-ids (typing four characters like _826 should be enough to trigger a completion of probably at most a few dozen choices, perhaps a thousand ones if unlucky).
Hence typing the three keys p a tab would offer completion with a few names like payload_json or payload_vectval etc... and typing the five keys _ 5 H f tab would offer completion with very few object-ids, notably _5Hf0fFKvRVa71ZPM0
sample incomplete code
So far I coded the following:
static GtkWidget *
mom_objectentry (void)
{
GtkWidget *obent = gtk_combo_box_text_new_with_entry ();
gtk_widget_set_size_request (obent, 30, 10);
mo_value_t namsetv = mo_named_objects_set ();
I have Boehm-garbage-collected values, and mo_value_t is a pointer to any of them. Values can be tagged integers, pointers to strings, objects, or tuples or sets of objects. So namesetv now contains the set of named objects (probably less than a few thousand of named objects).
int nbnam = mo_set_size (namsetv);
MOM_ASSERTPRINTF (nbnam > 0, "bad nbnam");
mo_value_t *namarr = mom_gc_alloc (nbnam * sizeof (mo_value_t));
int cntnam = 0;
for (int ix = 0; ix < nbnam; ix++)
{
mo_objref_t curobr = mo_set_nth (namsetv, ix);
mo_value_t curnamv = mo_objref_namev (curobr);
if (mo_dyncast_string (curnamv))
namarr[cntnam++] = curnamv;
}
qsort (namarr, cntnam, sizeof (mo_value_t), mom_obname_cmp);
for (int ix = 0; ix < cntnam; ix++)
gtk_combo_box_text_append_text (GTK_COMBO_BOX_TEXT (obent),
mo_string_cstr (namarr[ix]));
at this point I have sorted all the (few thousands at most) names and added "statically" them using gtk_combo_box_text_append_text.
GtkWidget *combtextent = gtk_bin_get_child (GTK_BIN (obent));
MOM_ASSERTPRINTF (GTK_IS_ENTRY (combtextent), "bad combtextent");
MOM_ASSERTPRINTF (gtk_entry_get_completion (GTK_ENTRY (combtextent)) ==
NULL, "got completion in combtextent");
I noticed with a bit of surprise that gtk_entry_get_completion (GTK_ENTRY (combtextent)) is null.
But I am stuck here. I am thinking of:
Having some mom_set_complete_objectid(const char*prefix) which given a prefix like "_47n" of at least four characters would return a garbage collected mo_value_t representing the set of objects with that prefix. This is very easy to code for me, and is nearly done.
Make my own local GtkEntryCompletion* mycompl = ..., which would complete like I want. Then I would put it in the text entry combtextent of my gtk-combo-box-text using gtk_entry_set_completion(GTK_ENTRY(combtextent), mycompl);
Should it use the entries added with gtk_combo_box_text_append_text for the "static" name completion role? How should I dynamically complete using the dynamic set value returned from my mom_set_complete_objectid; given some object-pointer obr and some char bufid[20]; I am easily and quickly able to fill it with the object-id of that object obr with mo_cstring_from_hi_lo_ids(bufid, obr->mo_ob_hid, obr->mo_ob_loid)..
I don't know how to code the above. For reference, I am now just returning the combo-box-text:
// if the entered text starts with a letter, I want it to be
// completed with the appended text above if the entered text starts
// with an undersore, then a digit, then two alphanum (like _0BV or
// _6S3 for example), I want to call a completion function.
#warning objectentry: what should I code here?
return obent;
} /* end mom_objectentry */
Is my approach the right one?
The mom_objectentry function above is used to fill modal dialogs with short lifetime.
I am favoring simple code over efficiency. Actually, my code is temporary (I'm hoping to bootstrap my language, and generate all its C code!) and in practice I'll probably have only a few hundred names and at most a few dozen of thousands of object-ids. So performance is not very important, but simplicity of coding (some conceptually "throw away" code) is more important.
I don't want (if possible) to add my own GTK classes. I prefer using existing GTK classes and widgets, customizing them with GTK signals and callbacks.
context
My application is an experimental persistent programming language and implementation with a near Scheme or Python (or JavaScript, ignoring the prototype aspect, ...) semantics but with a widely different (not yet implemented in september 7th, 2016) syntax (to be shown & input in GTK widgets), using the Boehm garbage collector for values (including objects, sets, tuples, strings...)... Values (including objects) are generally persistent (except the GTK related data : the application starts with a nearly empty window). The entire language heap is persisted in JSON-like syntax in some Sqlite "database" (generated at application exit) dumped into _momstate.sql which is re-loaded at application startup. Object-ids are useful to show object references to the user in GTK widgets, for persistence, and to generate C code related to the objects (e.g. the object of id _76f7e2VcL8IJC1hq6 could be related to a mo_76f7e2VcL8IJC1hq6 identifier in some generated C code; this is partly why I have my object-id format instead of using UUIDs).
PS. My C code is GPLv3 free software and available on github. It is the MELT monitor, branch expjs, commit e2b3b99ef66394...
NB: The objects mentioned here are implicitly my language objects, not GTK objects. The all have a unique object-id, and some but not most of them are named.
I will not show exact code on how to do it because I never did GTK & C only GTK & Python, but it should be fine as the functions in C and Python functions can easily be translated.
OP's approach is actually the right one, so I will try to fill in the gaps. As the amount of static options is limited probably won't change to much it indeed makes sense to add them using gtk_combo_box_text_append which will add them to the internal model of the GtkComboBoxText.
Thats covers the static part, for the dynamic part it would be perfect if we could just store this static model and replace it with a temporay model using gtk_combo_box_set_model() when a _ was found at the start of the string. But we shouldn't do this as the documentation says:
You should not call gtk_combo_box_set_model() or attempt to pack more cells into this combo box via its GtkCellLayout interface.
So we need to work around this, one way of doing this is by adding a GtkEntryCompletion to the entry of the GtkComboBoxText. This will make the entry attempt to complete the current string based on its current model. As an added bonus it can also add all the character all options have in common like this:
As we don't want to load all the dynamic options before hand I think the best approach will be to connect a changed listener to the GtkEntry, this way we can load the dynamic options when we have a underscore and some characters.
As the GtkEntryCompletion uses a GtkListStore internally, we can reuse part of the code Nominal Animal provided in his answer. The main difference being: the connect is done on the GtkEntry and the replacing of GtkComboText with GtkEntryCompletion inside the populator. Then everything should be fine, I wish I would be able to write decent C then I would have provided you with code but this will have to do.
Edit: A small demo in Python with GTK3
import gi
gi.require_version('Gtk', '3.0')
import gi.repository.Gtk as Gtk
class CompletingComboBoxText(Gtk.ComboBoxText):
def __init__(self, static_options, populator, **kwargs):
# Set up the ComboBox with the Entry
Gtk.ComboBoxText.__init__(self, has_entry=True, **kwargs)
# Store the populator reference in the object
self.populator = populator
# Create the completion
completion = Gtk.EntryCompletion(inline_completion=True)
# Specify that we want to use the first col of the model for completion
completion.set_text_column(0)
completion.set_minimum_key_length(2)
# Set the completion model to the combobox model such that we can also autocomplete these options
self.static_options_model = self.get_model()
completion.set_model(self.static_options_model)
# The child of the combobox is the entry if 'has_entry' was set to True
entry = self.get_child()
entry.set_completion(completion)
# Set the active option of the combobox to 0 (which is an empty field)
self.set_active(0)
# Fill the model with the static options (could also be used for a history or something)
for option in static_options:
self.append_text(option)
# Connect a listener to adjust the model when the user types something
entry.connect("changed", self.update_completion, True)
def update_completion(self, entry, editable):
# Get the current content of the entry
text = entry.get_text()
# Get the completion which needs to be updated
completion = entry.get_completion()
if text.startswith("_") and len(text) >= completion.get_minimum_key_length():
# Fetch the options from the populator for a given text
completion_options = self.populator(text)
# Create a temporary model for the completion and fill it
dynamic_model = Gtk.ListStore.new([str])
for completion_option in completion_options:
dynamic_model.append([completion_option])
completion.set_model(dynamic_model)
else:
# Restore the default static options
completion.set_model(self.static_options_model)
def demo():
# Create the window
window = Gtk.Window()
# Add some static options
fake_static_options = [
"comment",
"if",
"the_GUI",
"the_system",
"payload_json",
"x1",
"payload_json",
"payload_vectval"
]
# Add the the Combobox
ccb = CompletingComboBoxText(fake_static_options, dynamic_option_populator)
window.add(ccb)
# Show it
window.show_all()
Gtk.main()
def dynamic_option_populator(text):
# Some fake returns for the populator
fake_dynamic_options = [
"_5Hf0fFKvRVa71ZPM0",
"_8261sbF1f9ohzu2Iu",
"_0BV96V94PJIn9si1K",
"_0BV1sbF1f9ohzu2Iu",
"_0BV0fFKvRVa71ZPM0",
"_0Hf0fF4PJIn9si1Ks",
"_6KvRVa71JIn9si1Kw",
"_5HKvRVa71Va71ZPM0",
"_8261sbF1KvRVa71ZP",
"_0BKvRVa71JIn9si1K",
"_0BV1KvRVa71ZPu2Iu",
"_0BV0fKvRVa71ZZPM0",
"_0Hf0fF4PJIbF1f9oh",
"_61sbFV0fFKn9si1Kw",
"_5Hf0fFKvRVa71ozu2",
]
# Only return those that start with the text
return [fake_dynamic_option for fake_dynamic_option in fake_dynamic_options if fake_dynamic_option.startswith(text)]
if __name__ == '__main__':
demo()
Gtk.main()
Here is my suggestion:
Use a GtkListStore to contain a list of GTK-managed strings (essentially, copies of your identifier string) that match the current prefix string.
(As documented for gtk_list_store_set(), a G_TYPE_STRING item is copied. I consider the overhead of the extra copy acceptable here; it should not affect real-world performance much anyway, I think, and in return, GTK+ will manage the reference counting for us.)
The above is implemented in a GTK+ callback function, which gets an extra pointer as payload (set at the time the GUI is created or activated; I suggest you use some structure to keep references you need to generate the matches). The callback is connected to the combobox popup signal, so that it gets called whenever the list is expanded.
Note that as B8vrede noted in a comment, a GtkComboBoxText should not be modified via its model; that is why one should/must use a GtkComboBox instead.
Practical example
For simplicity, let's assume all the data you need to find or generate all known identifiers matched against is held in a structure, say
struct generator {
/* Whatever data you need to generate prefix matches */
};
and the combo box populator helper function is then something like
static void combo_box_populator(GtkComboBox *combobox, gpointer genptr)
{
struct generator *const generator = genptr;
GtkListStore *combo_list = GTK_LIST_STORE(gtk_combo_box_get_model(combobox));
GtkWidget *entry = gtk_bin_get_child(GTK_BIN(combobox));
const char *prefix = gtk_entry_get_text(GTK_ENTRY(entry));
const size_t prefix_len = (prefix) ? strlen(prefix) : 0;
GtkTreeIter iterator;
/* Clear the current store */
gtk_list_store_clear(combo_list);
/* Initialize the list iterator */
gtk_tree_model_get_iter_first(GTK_TREE_MODEL(combo_list), &iterator);
/* Find all you want to have in the combo box;
for each const char *match, do:
*/
gtk_list_store_append(combo_list, &iterator);
gtk_list_store_set(combo_list, &iterator, 0, match, -1);
/* Note that the string pointed to by match is copied;
match is not referred to after the _set() returns.
*/
}
When the UI is built or activated, you need to ensure the GtkComboBox has an entry (so the user can write text into it), and a GtkListStore model:
struct generator *generator;
GtkWidget *combobox;
GtkListStore *combo_list;
combo_list = gtk_list_store_new(1, G_TYPE_STRING);
combobox = gtk_combo_box_new_with_model_and_entry(GTK_TREE_MODEL(combo_list));
gtk_combo_box_set_id_column(GTK_COMBO_BOX(combobox), 0);
gtk_combo_box_set_entry_text_column(GTK_COMBO_BOX(combobox), 0);
gtk_combo_box_set_button_sensitivity(GTK_COMBO_BOX(combobox), GTK_SENSITIVITY_ON);
g_signal_connect(combobox, "popup", G_CALLBACK(combo_box_populator), generator);
On my system, the default pop-up accelerator is Alt+Down, but I assume you've already changed that to Tab.
I have a crude working example here (a .tar.xz tarball, CC0): it reads lines from standard input, and lists the ones matching the user prefix in reverse order in the combo box list (when popped-up). If the entry is empty, the combobox will contain all input lines. I didn't change the default accelerators, so instead of Tab, try Alt+Down.
I also have the same example, but using GtkComboBoxText instead, here (also CC0). This does not use a GtkListStore model, but uses gtk_combo_box_text_remove_all() and gtk_combo_box_text_append_text() functions to manipulate the list contents directly. (There is just a few different lines in the two examples.) Unfortunately, the documentation is not explicit whether this interface references or copies the strings. Although copying is the only option that makes sense, and this can be verified from the current Gtk+ sources, the lack of explicit documentation makes me hesitant.
Comparing the two examples I linked to above (both grab some 500 random words from /usr/share/dict/words if you compile and run it with make), I don't see any speed difference. Both use the same naïve way of picking prefix matches from a linked list, which means the two methods (GtkComboBox + model, or GtkComboBoxText) should be about equally fast.
On my own machine, both get annoyingly slow with more than 1000 or so matches in the popup; with just a hundred or less matches, it feels instantaneous. This, to me, indicates that the slow/naïve way of picking prefix matches from a linked list is not the culprit (because the entire list is traversed in both cases), but that the GTK+ combo boxes are just not designed for large lists. (The slowdown is definitely much, much worse than linear.)

Django - would these query sets be cached?

class UnassignedThread(models.Manager):
def get_queryset(self):
return super(UnassignedThread,
self).get_queryset().filter(
_irc_name__isnull=True)
Would results = ThreadVault.unassigned_threads.all() be cached? I am not certain if _isnull=True counts as being a evaluated(since the evaluation causes the cache).
Also, if have a model called ThreadVault, and I want to look up if threads #777 and #888 exist in the database, which way is the best to utilize cache to do the look up?
ThreadVault.objects.get(thread_id="777")
ThreadVault.objects.get(thread_id="888")
or
results = ThreadVault.objects.all()
for ticket in results:
if ticket.thread_id == "777" or ticket.thread_id == "888":
do something
No, querysets are lazy until they are sliced or iterated. filter simply adds conditions to the query, but does not evaluate it.
For your second question, neither of these are great, although the first is vastly preferable to the second (which involves loading and iterating through every object in the table). Instead, you should use exists() in conjunction with an __in filter:
ThreadVault.objects.filter(thread_id__in=["777", "888"].exists()
Neither of these questions has anything to do with caching.
th_ids = ["777","888"]
ThreadVault.objects.filter(thread_id__in=th_ids).exists()
for caching your view
from django.views.decorators.cache import cache_page
#cache_page(60 * 15)
def my_view(request):

parallel code execution python2.7 ndb

in my app i for one of the handler i need to get a bunch of entities and execute a function for each one of them.
i have the keys of all the enities i need. after fetching them i need to execute 1 or 2 instance methods for each one of them and this slows my app down quite a bit. doing this for 100 entities takes around 10 seconds which is way to slow.
im trying to find a way to get the entities and execute those functions in parallel to save time but im not really sure which way is the best.
i tried the _post_get_hook but the i have a future object and need to call get_result() and execute the function in the hook which works kind of ok in the sdk but gets a lot of 'maximum recursion depth exceeded while calling a Python objec' but i can't really undestand why and the error message is not really elaborate.
is the Pipeline api or ndb.Tasklets what im searching for?
atm im going by trial and error but i would be happy if someone could lead me to the right direction.
EDIT
my code is something similar to a filesystem, every folder contains other folders and files. The path of the Collections set on another entity so to serialize a collection entity i need to get the referenced entity and get the path. On a Collection the serialized_assets() function is slower the more entities it contains. If i could execute a serialize function for each contained asset side by side it would speed things up quite a bit.
class Index(ndb.Model):
path = ndb.StringProperty()
class Folder(ndb.Model):
label = ndb.StringProperty()
index = ndb.KeyProperty()
# contents is a list of keys of contaied Folders and Files
contents = ndb.StringProperty(repeated=True)
def serialized_assets(self):
assets = ndb.get_multi(self.contents)
serialized_assets = []
for a in assets:
kind = a._get_kind()
assetdict = a.to_dict()
if kind == 'Collection':
assetdict['path'] = asset.path
# other operations ...
elif kind == 'File':
assetdict['another_prop'] = asset.another_property
# ...
serialized_assets.append(assetdict)
return serialized_assets
#property
def path(self):
return self.index.get().path
class File(ndb.Model):
filename = ndb.StringProperty()
# other properties....
#property
def another_property(self):
# compute something here
return computed_property
EDIT2:
#ndb.tasklet
def serialized_assets(self, keys=None):
assets = yield ndb.get_multi_async(keys)
raise ndb.Return([asset.serialized for asset in assets])
is this tasklet code ok?
Since most of the execution time of your functions are spent waiting for RPCs, NDB's async and tasklet support is your best bet. That's described in some detail here. The simplest usage for your requirements is probably to use the ndb.map function, like this (from the docs):
#ndb.tasklet
def callback(msg):
acct = yield ndb.get_async(msg.author)
raise tasklet.Return('On %s, %s wrote:\n%s' % (msg.when, acct.nick(), msg.body))
qry = Messages.query().order(-Message.when)
outputs = qry.map(callback, limit=20)
for output in outputs:
print output
The callback function is called for each entity returned by the query, and it can do whatever operations it needs (using _async methods and yield to do them asynchronously), returning the result when it's done. Because the callback is a tasklet, and uses yield to make the asynchronous calls, NDB can run multiple instances of it in parallel, and even batch up some operations.
The pipeline API is overkill for what you want to do. Is there any reason why you couldn't just use a taskqueue?
Use the initial request to get all of the entity keys, and then enqueue a task for each key having the task execute the 2 functions per-entity. The concurrency will be based then on the number of concurrent requests as configured for that taskqueue.

Resources