is there a decent way to implant some bijectors inside a bijector in tensorflow_probability? - tensorflow-probability

Let's say I want to build a bijector, and there's a two tfp.bijectors inside of it. Cuz it's a tfb.Bijector, I have to implement _forward_log_det_jacobian method.
class Bij(tfb.Bijector):
def __init___(...):
super().__init__(..., forward_min_event_ndims=3)
self.bi1 = Bijector1()
self.bi2 = Bijector2()
def _forward(self, x):
x = self.bi1.forward(x)
# some ops on x here
return self.bi2.forward(x)
# I hope the way it forwards is right, I can't just chain them up cuz some ops on x between bi1 and bi2. If there's better way to achieve this, pls let me know!
...
def _forward_log_det_jacobian(self, x):
# I am a bit confused here. I use bi1.forward_log_det_jacobian(x) to get
# jaconbian then summed it with bi2.forward_log_det_jacobian(x) with parameter
# event_ndims = forward_min_event_ndims in both forward_log_det_jacobian().
The output seems to be right but I am not quite sure, since bi1 is a chain which contains a forward_min_event_ndims=1(<3) bijector.
Is there a recommended way to achieve something like this? Or is the way I wrote correct?

Can you use tfb.Chain?
Also bijectors have a __call__ method implemented which will give you a Chain instance, i.e. chain = bij1(bij2).

Related

Query on TFP Probabilistic Model

In the TFP tutorial, the model output is Normal distribution. I noted that the output can be replaced by an IndependentNormal layer. In my model, the y_true is binary class. Therefore, I used an IndependentBernoulli layer instead of IndependentNormal layer.
After building the model, I found that it has two output parameters. It doesn't make sense to me since Bernoulli distribution has one parameter only. Do you know what went wrong?
# Define the prior weight distribution as Normal of mean=0 and stddev=1.
# Note that, in this example, the we prior distribution is not trainable,
# as we fix its parameters.
def prior(kernel_size, bias_size, dtype=None):
n = kernel_size + bias_size
prior_model = Sequential([
tfpl.DistributionLambda(
lambda t: tfd.MultivariateNormalDiag(loc=tf.zeros(n), scale_diag=tf.ones(n))
)
])
return prior_model
# Define variational posterior weight distribution as multivariate Gaussian.
# Note that the learnable parameters for this distribution are the means,
# variances, and covariances.
def posterior(kernel_size, bias_size, dtype=None):
n = kernel_size + bias_size
posterior_model = Sequential([
tfpl.VariableLayer(tfpl.MultivariateNormalTriL.params_size(n), dtype=dtype),
tfpl.MultivariateNormalTriL(n)
])
return posterior_model
# Create a probabilistic DL model
model = Sequential([
tfpl.DenseVariational(units=16,
input_shape=(6,),
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/X_train.shape[0],
activation='relu'),
tfpl.DenseVariational(units=16,
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/X_train.shape[0],
activation='sigmoid'),
tfpl.DenseVariational(units=tfpl.IndependentBernoulli.params_size(1),
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/X_train.shape[0]),
tfpl.IndependentBernoulli(1, convert_to_tensor_fn=tfd.Bernoulli.logits)
])
model.summary()
screenshot of the results executed the codes on Google Colab
I agree the summary display is confusing but I think this is an artifact of the way tfp layers are implemented to interact with keras. During normal operation, there will only be one return value from a DistributionLambda layer. But in some contexts (that I don't fully grok) DistributionLambda.call may return both a distribution and a side-result. I think the summary plumbing triggers this for some reason, so it looks like there are 2 outputs, but there will practically only be one. Try calling your model object on X_train, and you'll see you get a single distribution out (its type is actually something called TensorCoercible, which is a wrapper around a distribution that lets you pass it into tf ops that call tf.convert_to_tensor -- the resulting value for that op will be the result of calling your convert_to_tensor_fn on the enclosed distribution).
In summary, your distribution layer is fine but the summary is confusing. It could probably be fixed; I'm not keras-knowledgeable enough to opine on how hard it would be.
Side note: you can omit the event_shape=1 parameter -- the default value is (), or "scalar", which will behave the same.
HTH!

using ruby to extract the values in a hash in a DRY way

My app passes to different methods a json_element for which the keys are different, and sometimes empty.
To handle it, I have been hard-coding the extraction with the following sample code:
def act_on_ruby_tag(json_element)
begin
# logger.progname = __method__
logger.debug json_element
code = json_element['CODE']['$'] unless json_element['CODE'].nil?
predicate = json_element['PREDICATE']['$'] unless json_element['PREDICATE'].nil?
replace = json_element['REPLACE-KEY']['$'] unless json_element['REPLACE-KEY'].nil?
hash = json_element['HASH']['$'] unless json_element['HASH'].nil?
I would like to eliminate hardcoding the values, and not quite sure how.
I started to think through it as follows:
keys = json_element.keys
keys.each do |k|
set_key = k.downcase
instance_variable_set("#" + set_key, json_element[k]['$']) unless json_element[k].nil?
end
And then use #code for example in the rest of the method.
I was going to try to turn into a method and then replace all this hardcoded code.
But I wasn't entirely sure if this is a good path.
It's almost always better to return a hash structure from a method where you have things like { code: ... } rather than setting arbitrary instance variables. If you return them in a consistent container, it's easier for callers to deal with delivering that to the right location, storing it for later, or picking out what they want and discarding the rest.
It's also a good idea to try and break up one big, clunky step with a series of smaller, lighter operations. This makes the code a lot easier to follow:
def extract(json)
json.reject do |k, v|
v.nil?
end.map do |k, v|
[ k.downcase, v['$'] ]
end.to_h
end
Then you get this:
extract(
'TEST' => { '$' => 'value' },
'CODE' => { '$' => 'code' },
'NULL' => nil
)
# => {"test"=>"value", "code"=>"code"}
If you want to persist this whole thing as an instance variable, that's a fairly typical pattern, but it will have a predictable name that's not at the mercy of whatever arbitrary JSON document you're consuming.
An alternative is to hard-code the keys in a constant like:
KEYS = %w[ CODE PREDICATE ... ]
Then use that instead, or one step further, define that in a YAML or JSON file you can read-in for configuration purposes. It really depends on how often these will change, and what sort of expectations you have about the irregularity of the input.
This is a slightly more terse way to do what your original code does.
code, predicate, replace, hash = json_element.values_at *%w{
CODE PREDICATE REPLACE-KEY HASH
}.map { |x| x.fetch("$", nil) if x }

How to get entities from a searchResults object in GAE NDB

I'm not sure if I'm using the NDB Search API as it's meant to be used. I've read through the documentation, but I think I'm either missing something or lacking in my python skills. Can anyone confirm/improve this progression of using search?
# build the query object
query_options = search.QueryOptions(limit=results_per_page, offset=number_to_offset)
query_object = search.Query(query_string=escaped_param, options=query_options)
# searchResults object
video_search_results = videos.INDEX.search(query_object)
# ScoredDocuments list
video_search_docs = video_search_results.results
# doc_ids
video_ids = [x.doc_id for x in video_search_docs]
# entities
video_entities = [Video.GetById(x) for x in video_ids]
I might personally write this something more like:
# build the query object
query_options = search.QueryOptions(limit=results_per_page, offset=number_to_offset)
query_object = search.Query(query_string=escaped_param, options=query_options)
# do the search
video_search = search.Index(name=VIDEO_INDEX).search(query_object)
# list of matching video keys
video_keys = [ndb.Key(Video, x.doc_id) for x in video_search.results]
# get video entities
video_entities = ndb.get_multi(video_keys)
Using ndb.get_multi will be more efficient. You can use AppStats to verify that. You might also look into the async equivalent if you have other processing you can do while the RPCs are outstanding.
I am not sure what the Video.GetById method actually is, but I would suggest you see the ndb documentation on Model.get_by_id.
It seems like everything is OK with your code. Does something go wrong with it?

parallel code execution python2.7 ndb

in my app i for one of the handler i need to get a bunch of entities and execute a function for each one of them.
i have the keys of all the enities i need. after fetching them i need to execute 1 or 2 instance methods for each one of them and this slows my app down quite a bit. doing this for 100 entities takes around 10 seconds which is way to slow.
im trying to find a way to get the entities and execute those functions in parallel to save time but im not really sure which way is the best.
i tried the _post_get_hook but the i have a future object and need to call get_result() and execute the function in the hook which works kind of ok in the sdk but gets a lot of 'maximum recursion depth exceeded while calling a Python objec' but i can't really undestand why and the error message is not really elaborate.
is the Pipeline api or ndb.Tasklets what im searching for?
atm im going by trial and error but i would be happy if someone could lead me to the right direction.
EDIT
my code is something similar to a filesystem, every folder contains other folders and files. The path of the Collections set on another entity so to serialize a collection entity i need to get the referenced entity and get the path. On a Collection the serialized_assets() function is slower the more entities it contains. If i could execute a serialize function for each contained asset side by side it would speed things up quite a bit.
class Index(ndb.Model):
path = ndb.StringProperty()
class Folder(ndb.Model):
label = ndb.StringProperty()
index = ndb.KeyProperty()
# contents is a list of keys of contaied Folders and Files
contents = ndb.StringProperty(repeated=True)
def serialized_assets(self):
assets = ndb.get_multi(self.contents)
serialized_assets = []
for a in assets:
kind = a._get_kind()
assetdict = a.to_dict()
if kind == 'Collection':
assetdict['path'] = asset.path
# other operations ...
elif kind == 'File':
assetdict['another_prop'] = asset.another_property
# ...
serialized_assets.append(assetdict)
return serialized_assets
#property
def path(self):
return self.index.get().path
class File(ndb.Model):
filename = ndb.StringProperty()
# other properties....
#property
def another_property(self):
# compute something here
return computed_property
EDIT2:
#ndb.tasklet
def serialized_assets(self, keys=None):
assets = yield ndb.get_multi_async(keys)
raise ndb.Return([asset.serialized for asset in assets])
is this tasklet code ok?
Since most of the execution time of your functions are spent waiting for RPCs, NDB's async and tasklet support is your best bet. That's described in some detail here. The simplest usage for your requirements is probably to use the ndb.map function, like this (from the docs):
#ndb.tasklet
def callback(msg):
acct = yield ndb.get_async(msg.author)
raise tasklet.Return('On %s, %s wrote:\n%s' % (msg.when, acct.nick(), msg.body))
qry = Messages.query().order(-Message.when)
outputs = qry.map(callback, limit=20)
for output in outputs:
print output
The callback function is called for each entity returned by the query, and it can do whatever operations it needs (using _async methods and yield to do them asynchronously), returning the result when it's done. Because the callback is a tasklet, and uses yield to make the asynchronous calls, NDB can run multiple instances of it in parallel, and even batch up some operations.
The pipeline API is overkill for what you want to do. Is there any reason why you couldn't just use a taskqueue?
Use the initial request to get all of the entity keys, and then enqueue a task for each key having the task execute the 2 functions per-entity. The concurrency will be based then on the number of concurrent requests as configured for that taskqueue.

Idiomatic List Wrapper

In Google App Engine, I make lists of referenced properties much like this:
class Referenced(BaseModel):
name = db.StringProperty()
class Thing(BaseModel):
foo_keys = db.ListProperty(db.Key)
def __getattr__(self, attrname):
if attrname == 'foos':
return Referenced.get(self.foo_keys)
else:
return BaseModel.__getattr__(self, attrname)
This way, someone can have a Thing and say thing.foos and get something legitimate out of it. The problem comes when somebody says thing.foos.append(x). This will not save the added property because the underlying list of keys remains unchanged. So I quickly wrote this solution to make it easy to append keys to a list:
class KeyBackedList(list):
def __init__(self, key_class, key_list):
list.__init__(self, key_class.get(key_list))
self.key_class = key_class
self.key_list = key_list
def append(self, value):
self.key_list.append(value.key())
list.append(self, value)
class Thing(BaseModel):
foo_keys = db.ListProperty(db.Key)
def __getattr__(self, attrname):
if attrname == 'foos':
return KeyBackedList(Thing, self.foo_keys)
else:
return BaseModel.__getattr__(self, attrname)
This is great for proof-of-concept, in that it works exactly as expected when calling append. However, I would never give this to other people, since they might mutate the list in other ways (thing[1:9] = whatevs or thing.sort()). Sure, I could go define all the __setslice__ and whatnot, but that seems to leave me open for obnoxious bugs. However, that is the best solution I can come up with.
Is there a better way to do what I am trying to do (something in the Python library perhaps)? Or am I going about this the wrong way and trying to make things too smooth?
If you want to modify things like this, you shouldn't be changing __getattr__ on the model; instead, you should write a custom Property class.
As you've observed, though, creating a workable 'ReferenceListProperty' is difficult and involved, and there are many subtle edge cases. I would recommend sticking with the list of keys, and fetching the referenced entities in your code when needed.

Resources