What's the best practice to for Thrift file (api) versioning? - versioning

I have an API written in thrift. Example:
service Api {
void invoke()
}
It does something. I want to change the behavior to do something else but still keep the old behavior for clients that expect the old behavior.
What's the best practice to handle a new API version?

Soft versioning
Thrift supports soft versioning, so it is perfectly valid to do a version 2 of your service which looks like this:
service Api {
void invoke(1: string optional_arg1, 2: i32 optional_arg2) throws (1: MyError e)
i32 number_of_invokes()
}
Since the newly added arguments are technically optional, an arbitrary clients request may or may not contain them, or may contain only parts of them (e.g. specify arg1 but not arg2). The exception is a bit different, old clients will raise some kind of generic unexpected exception or similar.
It is even possible to remove an outdated function entirely, in this case old clients will get an exception whenever they try to call the (now non-existing) removed function.
All of the above is similarly is true with regard to adding member fields to structures, exceptions etc. Instead of removing declarations from the IDL file, it is recommended to comment out old removed member fields and functions to prevent people from re-using old field IDs, old function names or old enum values in later versions.
struct foobar {
// API 1.0 fields
1: i32 foo
//2: i32 bar - obsolete with API 2.0
// API 2.0 fields
3: i32 baz
}
required is forever
Where you need to be careful is the use of the keyword required. Once you publish an API with a struct containing an required member, you will need to carry this one until the structure as a whole is removed. Same is true with adding new required fields later on. Otherwise you risk breaking changes, because mixing old and new clients and servers will sooner or later produce a situation where one end absolutely expects a certain required member field, but the opposite end can't deliver, simply because it does not know anything about it.
This is not a problem with normal or optional fields, since Thrift is designed to skip over unknown fields (the type ID is contained in the wire data), and just ignore missing fields. In contrast, additional checks are applied for required fields to ensure they are present in the wire data.
Endpoints
Although soft versioning is a great tool, it comes at the cost of cumulating burdens due to the need to be compatible. Furthermore, in some cases your API will undergo breaking changes, intentionally without being backwards compatible. In that case, it is recommened to set the new service at a different endpoint.
Alternatively, the multiplex protocol introduced with Thrift 0.9.2 can be used to offer multiple services and/or service versions over the same endpoint (i.e. socket, http URI, ...)

For your specific scenario, you can just add a new method (named something else) that does the new thing. In the future, I would avoid naming methods invoke or similar for this exact reason. Your service will now have a method invoke that does some unknown thing and another method (hopefully named better) that does what it says it does. This may lead to confusion by your users, but everything will still work.

Related

Can I query a Near contract for its method signatures?

Is there a way to query what methods are offered by a given NEAR contract? (So that one could do autodiscovery of some standard interface, for instance.) Or do you have to just know the method signatures already before you can interact with a contract?
No not yet. Currently all contract methods have the same signature. () -> () No arguments and nothing is returned. Each method has a wrapper function that deserializes the input bytes from a host; calls the method; and serializes the return value and passes the bytes back to the host.
This is done with input and value_return. See input..
There are plans to include the actual signatures of the methods in the binary in a special section, which would solve this issue.
NEP-351 was recently approved, which provides a mechanism for contracts to expose all standards they implement. However, it is up to contract developers to follow this NEP. When integrated into the main SDK, I presume most will.
Alternatively, there is a proposal to create a global registry as a smart contract that provides this information.
Currently, there is not.
You will need to know what contract methods are available in order to interact with a smart contract deployed on NEAR. Hopefully, the ability to query available methods will be added in the near future.
I suppose you can just include a method in your own contracts that returns the other method signatures in some useful format: json or whatever
you would have to make sure that it stays current by maybe writing some unit tests that use this method to exercise all others
I suppose this interface (method and unit tests) can be standardized as an NEP in the short term until our interface becomes discoverable. any contracts that adhere to this NEP must include this "tested reflection method" or "documentation method" or whatever it would be called

How to pass Gatling session attributes in an exec() invoking another library (generated gRPC code)?

Newbie Gatling+Scala question: I’m using George Leung's gatling-grpc library (which is modeled after the http library) and trying to pass a value from the session (generated in a feeder), into a non-DSL, non-Gatling method call, specifically calls populating the gRPC payload object.
Before I start, let me add that it seems I can’t use the sessionFunction (Expression[T]) form of exec, which would resolve my issue:
.exec{ session => { … grpc(…).rpc(…)… }}
…because, AFAICT, the grpc call must be the last thing in the block, or else it’s never evaluated ... yet it can’t be the last thing in the block because there’s no way to coerce it to return a Session object (again, AFAICT).
Therefore, I have to use the ActionBuilder form of exec (grpc(...) returns a Call so this is as designed):
.exec( grpc(…).rpc(…)... )
and this works… until I have a gRPC payload (i.e., non-Gatling) method call to which I need to pass a non-constant value (from a feeder).
In this context, I have no access to a Session object, and the Gatling Expression Language is not applied because the library defining the gRPC types I need to use (to generate the payload) has no knowledge of Gatling.
So, in this fragment:
.header(transactionIdHeader)("${tid}.SAVE")
.payload(Student.newBuilder()
.setId(GlobalId.newBuilder().setValue("${authid}_${uniqId}").build()).build())
)
…the first call evaluates ${tid} because the param in the second parens is Expression[T], and hence is evaluated as Expression Language, but the second call fails to evaluate ${authid} or ${uniqId} because the external, generated library that defines the gRPC type GlobalId has no knowledge of Gatling.
So...
Is there a way to invoke the EL outside of Gatling's DSL?
Or a way to access a Session object via an ActionBuilder?
(I see that the Gatling code magically finds a Session object when I use the sessionFunction form, but I can't see whence it comes — even looking at the bytecode is not illuminating)
Or, turning back to the Expression[T] form of exec, is there a way to have an ActionBuilder return a Session object?
Or, still in the Expression[T] form, I could trivially pass back the existing Session object, if I had a way to ensure the grpc()... expression was evaluated (i.e., imperative programming).
Gatling 3.3.1, Scala 2.12.10
The gatling-grpc library is at phiSgr/gatling-grpc; I'm using version 0.7.0 (com.github.phisgr:gatling-grpc).
(The gRPC Java code is generated from .proto files, of course.)
You need the Gatling-JavaPB integration.
To see that in action, see here.
The .payload method takes an Expression[T], which is an alias for Session => Validation[T]. In plain English, that is a function that constructs the payload from the session with a possibility of failure.
Much of your frustration is not knowing how to get hold of a Session. I hope this clears up the confusion.
In the worst case one can write a lambda to create an expression. But for string interpolation or accessing one single object, Gatling provides an implicit conversation to turn an EL String into an Expression.
The problem is you want to construct well-typed payloads and Gatling's EL cannot help with that. The builders’ setters want a T, but you only have an Expression[T] (either from EL or the $ function). The library mentioned above is created to handle that plumbing.
After importing com.github.phisgr.gatling.javapb._, you should write the following.
...
.payload(
Student.getDefaultInstance
.update(_.getIdBuilder.setValue)("${authid}_${uniqId}")
)
For the sake of completeness, see the warning in Gatling's documentation for why defining actions in .exec(sessionFunction) is not going to work.

Can I save changes to objects to another TR besides those they are locked?

When I try to switch to edit mode for a Report source, a popup comes up telling me
"A new task will be created for the following request of user XXX".
A transport request is also being suggested.
I don't want to save my changes in this request however, but in another existing one. I am not aware of any versioning systems being implemented in my system, and don't know how to check that.
Is what i'm trying to achieve possible? And if so, how?
No, this is not possible. There are very good reasons for this being an exclusive lock -- reasons that you should know about before you attempt to change anything. Briefly speaking
The CTS only notes that an object was touched, not what change was made.
When the transport is released, the entire object in its current state is exported - there is no delta/diff logic involved.
Therefore you can't separately transport changes to the same development object. Furthermore, if you serialize this manually, the second transport will always comprise the changes of the first one.
Things get slightly more complicated with partial objects - you can have LIMU METH objects (methods of a class) in different transports, but as soon as you try to lock the R3TR CLAS main class, you'll have to resolve that.

Django Models / SQLAlchemy are bloated! Any truly Pythonic DB models out there?

"Make things as simple as possible, but no simpler."
Can we find the solution/s that fix the Python database world?
Update: A 'lustdb' prototype has been written by Alex Martelli - if you know any somewhat lightweight, high-level database libraries with multiple backends we could wrap in syntax sugar honey, please weigh in!
from someAmazingDB import *
#we imported a smart model class and db object which talk to database adapter/s
class Task (model):
title = ''
done = False #native types not a custom object we have to think about!
db.taskList = []
#or
db.taskList = expandableTypeCollection(Task) #not sure what this syntax would be
db['taskList'].append(Task(title='Beat old sql interfaces',done=False))
db.taskList.append(Task('Illustrate different syntax modes',True)) # ok maybe we should just use kwargs
#at this point it should be autosaved to a default db option
#by default we should be able to reload the console and access the default db:
>> from someAmazingDB import *
>> print 'Done tasks:'
>> for task in db.taskList:
>> if task.done:
>> print task.title
'Illustrate different syntax modes'
I'm a fan of Python, webPy and Cherry Py, and KISS in general.
We're talking automatic Python to SQL type translation or NoSQL.
We don't have to totally be SQL compatible! Just a scalable subset or ignore it!
Re:model changes, it's ok to ask the developer when they try to change it or have a set of sensible defaults.
Here is the challenge: The above code should work with very little modification or thinking required. Why must we put up with compromise when we know better?
It's 2010, we should be able to code scalable, simple databases in our sleep.
If you think this is important, please upvote!
What you request cannot be done in Python 2.whatever, for a very specific reason. You want to write:
class Task(model):
title = ''
isDone = False
In Python 2.anything, whatever model may possibly be, this cannot ever allow you to predict any "ordering" for the two fields, because the semantics of a class statement are:
execute the body, thus preparing a dict
locate the metaclass and run special methods thereof
Whatever the metaclass may be, step 1 has destroyed any predictability of the fields' order.
Therefore, your desired use of positional parameters, in the snippet:
Task('Illustrate different syntax modes', True)
cannot associate the arguments' values with the model's various fields. (Trying to guess by type association -- hoping no two fields ever have the same type -- would be even more horribly unpythonic than your expressed desire to use db.tasklist and db['tasklist'] indifferently and interchangeably).
One of the backwards-incompatible changes in Python 3 was introduced specifically to deal with situations of this ilk. In Python 3, a custom metaclass can define a __prepare__ function which runs before "step 1" in the above simplified list, and this lets it have more control about the class's body. Specifically, quoting PEP 3115...:
__prepare__ returns a dictionary-like object which is used to store
the class member definitions during evaluation of the class body.
In other words, the class body is evaluated as a function block
(just like it is now), except that the local variables dictionary
is replaced by the dictionary returned from __prepare__. This
dictionary object can be a regular dictionary or a custom mapping
type.
...
An example would be a metaclass that
uses information about the
ordering of member declarations to create a C struct. The metaclass
would provide a custom dictionary that simply keeps a record of the
order of insertions.
You don't want to "create a C struct" as in this example, but the order of fields is crucial (to allow the use of positional parameters that you want) and so the custom metaclass (obtained through base model) would have a __prepare__ classmethod returning an ordered dictionary. This removes the specific issue, but, of course, only if you're willing to switch all of your code using this "magic ORM" to Python 3. Would you be?
Once that's settled, the issue is, what database operations do you want to perform, and how. Your example, of course, does not clarify this at all. Is the taskList attribute name special, or should any other attribute assigned to the db object be "autosaved" (by name and, what other characteristic[s]?) and "autoretrieved" upon use? Are there to be ways to remove entities, alter them, locate them (otherwise than by having once been listed in the same attribute of the db object)? How does your sample code know what DB service to use and how to authenticate to it (e.g. by userid and password) if it requires authentication?
The specific tasks you list would not be hard to implement (e.g. on top of Google App Engine's storage service, which does not require authentication nor specification of "what DB service to use"). model's metaclass would introspect the class's fields and generate a GAE Model for the class, the db object would use __setattr__ to set an atexit trigger for storing the final value of an attribute (as an entity in a different kind of Model of course), and __getattr__ to fetch that attribute's info back from storage. Of course without some extra database functionality this all would be pretty useless;-).
Edit: so I did a little prototype (Python 2.6, and based on sqlite) and put it up on http://www.aleax.it/lustdb.zip -- it's a 3K zipfile including 225-lines lustdb.py (too long to post here) and two small test files roughly equivalent to the OP's originals: test0.py is...:
from lustdb import *
class Task(Model):
title = ''
done = False
db.taskList = []
db.taskList.append(Task(title='Beat old sql interfaces', done=False))
db.taskList.append(Task(title='Illustrate different syntax modes', done=True))
and test1.p1 is...:
from lustdb import *
print 'Done tasks:'
for task in db.taskList:
if task.done:
print task
Running test0.py (on a machine with a writable /tmp directory -- i.e., any Unix-y OS, or, on Windows, one on which a mkdir \tmp has been run at any previous time;-) has no output; after that, running test1.py outputs:
Done tasks:
Task(done=True, title=u'Illustrate different syntax modes')
Note that these are vastly less "crazily magical" than the OP's examples, in many ways, such as...:
1. no (expletive delete) redundancy whereby `db.taskList` is a synonym of `db['taskList']`, only the sensible former syntax (attribute-access) is supported
2. no mysterious (and totally crazy) way whereby a `done` attribute magically becomes `isDone` instead midway through the code
3. no mysterious (and utterly batty) way whereby a `print task` arbitrarily (or magically?) picks and prints just one of the attributes of the task
4. no weird gyrations and incantations to allow positional-attributes in lieu of named ones (this one the OP agreed to)
The prototype of course (as prototypes will;-) leaves a lot to be desired in many respects (clarity, documentation, unit tests, optimization, error checking and diagnosis, portability among different back-ends, and especially DB features beyond those implied in the question). The missing DB features are legion (for example, the OP's original examples give no way to identify a "primary key" for a model, or any other kinds of uniqueness constraints, so duplicates can abound; and it only gets worse from there;-). Nevertheless, for 225 lines (190 net of empty lines, comments and docstrings;-), it's not too bad in my biased opinion.
The proper way to continue playing with this project would of course be to initiate a new lustdb open source project on the hosting part of code.google.com (or any other good open source hosting site with issue tracker, wiki, code reviews support, online browsing, DVCS support, etc, etc) - I'd do it myself but I'm close to the limit in terms of number of open source projects I can initiate on code.google.com and don't want to "burn" the last one or two in this way;-).
BTW, the lustdb name for the module is a play of word with the OP's initials (first two letters each of first and last names), in the tradition of awk and friends -- I think it sounds nicely (and most other obvious names such as simpledb and dumbdb are taken;-).
I think you should try ZODB. It is object oriented database designed for storing python objects. Its API is quite close to example you have provided in your question, just take a look at tutorial.
What about using Elixir?
Forget ORM! I like vanilla SQL. The python wrappers like psycopg2 for postgreSQL do automatic type conversion, offer pretty good protection against SQL injection, and are nice and simple.
sql = "SELECT * FROM table WHERE id=%s"
data = (5,)
cursor.execute(sql, data)
The more I think on't the more the Smalltalk model of operation seems more relevant. Indeed the OP may not have reached far enough by using the term "database" to describe a thing which should have no need for naming.
A running Python interpreter has a pile of objects that live in memory. Their inter-relationships can be arbitrarily complex, but namespaces and the "tags" that objects are bound to are very flexible. And as pickle can explicitly serialize arbitrary structures for persistence, it doesn't seem that much of a reach to consider each Python interpreter living in that object space. Why should that object space evaporate with the interpreter's close? Semantically, this could be viewed as an extension of the anydbm tied dictionaries. And since most every thing in Python is dictionary-like, the mechanism is almost already there.
I think this may be the generic model that Alex Martelli was proposing above, it might be nice to say something like:
class Book:
def __init__(self, attributes):
self.attributes = attributes
def __getattr__(....): pass
$ python
>>> import book
>>> my_stuff.library = {'garp':
Book({'author': 'John Irving', 'title': 'The World According to Garp',
'isbn': '0-525-23770-4', 'location': 'kitchen table',
'bookmark': 'page 127'}),
...
}
>>> exit
[sometime next week]
$ python
>>> import my_stuff
>>> print my_stuff.library['garp'].location
'kitchen table'
# or even
>>> for book in my_stuff.library where book.location.contains('kitchen'):
print book.title
I don't know that you'd call the resultant language Python, but it seems like it is not that hard to implement and makes backing store equivalent to active store.
There is a natural tension between the inherent structure imposed - and sometimes desired - by RDBMs and the rather free-form navel-gazing put here, but NoSQLy databases are already approaching the content addressable memory model and probably better approximates how our minds keep track of things. Contrariwise, you wouldn't want to keep all the corporate purchase orders such a storage system, but perhaps you might.
How about you give an example of how "simple" you want your "dealing with database" to be, and I then tell you all the stuff that is needed for that "simplicity" to get working ?
(And of which it will still be YOU that will be required to give the information/config to the database interface engine, somewhere, somehow.)
To name but one example :
If your database management engine is some external machine with which you/your app interfaces over IP or some such, there is no way around the fact that the IP identity of where that database engine is running, will have to be provided by your app's database interface client, somewhere, somehow. Regardless of whether that gets explicitly exposed in the code or not.
I've been busy, here it is, released under LGPL:
http://github.com/lukestanley/lustdb
It uses JSON as it's backend at the moment.
This is not the same codebase Alex Martelli did.
I wanted to make the code more readable and reusable with different
backends and such.
Elsewhere I have been working on object oriented HTML elements
accessable in Python in similar ways, AND a library for making web.py
more minimalist.
I'm thinking of ways of using all 3 elements together with automatic
MVC prototype construction or smart mapping.
While old fashioned text based template web programming will be around
for a while still because of legacy systems and because it doesn't
require any particular library or implementation, I feel soon we'll
have a lot more efficent ways of building robust, prototype friendly
web apps.
Please see the mailing list for those interested.
If you like CherryPy, you might like the complementary ORMs I wrote: GeniuSQL (which follows a Table Data gateway model) and Dejavu (which is a complete Data Mapper).
There's far too much in this question and all its subcomments to address completely, but one thing I wanted to point out was that GeniuSQL and Dejavu have a very robust system for mapping native Python types to the types that your particular backend is using. There are very sane defaults, which can be overridden as needed, and even extended if you make a new backend or use types from a backend that isn't yet supported. See http://www.aminus.net/geniusql/chrome/common/doc/trunk/advanced.html#custom for more discussion on that.

Repository Pattern Question

I'm building an ASP.NET MVC app and I'm using a repository to store and retrieve view objects. My question is, is it okay for the implementation of the various repositories to call each other? I.E. can the ICustomerRepository implementation call an implementation of IAddressRepository, or should it handle its own updates to the address data source?
EDIT:
Thanks everyone, the the Customer/Address example isn't real. The actual problem involves three aggregates which update a fourth aggregate in response to changes in their respective states. I in this case it seems a conflict between introducing dependencies vs violating the don't repeat yourself principle.
You should have a repository for each aggregate root.
I have no knowledge of your domain-model, but it doesn't feel natural for me to have an IAddressRepository. Unless 'Address' is an aggregate root in your domain.
In fact, in most circumstances, 'Address' is not even an entity, but a value object. That is, in most cases the identity of an 'Address' is determined by its value (the value of all its properties); it does not have a separate 'Id' (key) field.
So, when this is the case, the CustomerRepository should be responsible for storing the Address, as the Address is a part of the Customer aggregate-root.
Edit (ok so your situation was just an example):
But, if you have other situations where you would need another repository in a repository, then I think that it is better to remove that functionality out of that repository, and put it in a separate class (Service).
I mean: I think that, if you have some functionality inside a repository A, that relies on another repository B, then this kind of functionality doesn't belong inside repository A.
Instead, write another class (which is called a Service in DDD), in where you implement this functionality.
Anyway, I do not think that repositories should really call each other. If you do not want to write a Service however, and if you really want to keep that logic inside the repository itself, then pass the other repository as an argument in that specific method.
I hope I made myself a bit clear. :P
They really shouldn't call each other. A Repository is an abstraction of the (more or less) atomic operations that you want to perform on your domain. The less dependency they have, the better. Realistically, any consumer of a repository should expect to be able to point the repository class at a database and have it perform the necessary domain operations without a lot of configuration.
They should also represent "aggregates" in your domain - i.e. key focal points that a lot of functionality will be based around. I'm wondering why you would have a separate address information repository? Shouldn't that be part of your customer repository?
This depends on the type of repository (or at least the consequences do) but in general if you have data repositories calling each other you're going to run into problems with things like cyclical (repo A -> requires B -> requires C -. oops, requires A) or recursive data loads (A-> requires B & C -> C-requires D, E -> .... ->..ad nauseum). Testing also becomes more difficult.
For example, you need to load your address repository to properly run your customer repository, because the customer repository calls the address repo. If you need to test the customer repo, you'll need to do a db load of the addresses or mock them in some way, and ultimately you won't be able to load and test any single system repository without loading them all.
Having those dependencies is also kind of insidious because they're often not clear - usually you're dealing with a repository as a data-holding abstraction - if you have to be conscious of how they depend on each other you can't use them as an abstraction, but have to manage the load process whenever you want to use them.

Resources