Architecting a SaaS for backwards-compatibility in regards to data and business logic

Architecting a SaaS for backwards-compatibility in regards to data and business logic - versioning

I have a SaaS platform where the user fills out a form and data entered into the form is saved to a database. The form UI has a large amount of config (originates from the DB but ends up in JavaScript) and business logic (in JavaScript). After a form is filled out and saved, the user can go back at any time and edit it.
The wrinkle is that an old form entry needs to behave like it did when it was first filled out - it needs the same config and business logic - Even if the SaaS has gone through a data schema change and changes to business logic since then.
To confirm, new forms filled out by the user would use the new/current data schema and business logic of course. But previous forms needs to behave as they did when they were created.
So I need a sensible way to version config, business logic and any dependencies.
The best I've come up with is, when the user saves their entry, to save the form's config as JSON along with the entry. When the user goes back to edit an old entry, I do not load the config from current database schema but simply dump the JSON config that was saved with the entry.
For the business logic, I save a system version number along with the entry, for example "01". When the user loads an old form, I check the version of the entry and I then load the form JavaScript from a path like "js/main_01.js". When I make a non-backwards-compatible change to the business logic, I increase the system's version number to, for example, "02". New forms would then use "js/main_02.js". I also use this cheap versioning approach for HTML view templates which is getting hairy.
This approach works but it seems a bit flimsy or homegrown. I'm trying to avoid conditionals in my business logic like if version==2: do this. This approach avoids that but also has it's downsides.
I don't think the stack really matters for this convo but just in case, I'm using django/mysql.

You're likely to get a tremendous amount of "opinion" on this, and no real clear answer.
You could develop an API to your config and logic in may ways, with versioning saved with the submitted data, thereby requiring an API-Manager solution.
However, you could instead store the entire DOM object in the record that the data was stored, thereby creating a static page that is recalled and resubmitted at will, with separation between view and model.

Related

Where to store possibly sensitive but unimportant information

I am working on an app which touches sensitive information, like money.
We have some calculators, and we want to prefill the values with whatever the user has entered last. Apart from increasing UX, we don't need those. But we cannot store it in web storage or cookie because of security.
We have
a JS frontend,
an API Gateway backend that is supposed to be "stupid", so it only handles authentication and sending messages to to corresponding services
some services that actually care about the business logic
These possibilities come to mind and I cannot decide which I should do (and foremost: why)
Add a table in backend, that is a catch all for implementing cookie-like functionality in backend
Add a specific table in the service it fits the most
Use a key value store in backend (don't know about this, a coworker put it out there)

As i read your requirements it seams that this is kind of a defaulting including some business logic (stupid or smart). Personally i see defaulting as part of business logic and based on this it's part of the service which cares about this functionality.
Add a table in backend, that is a catch all for implementing cookie-like functionality in backend
This sounds like a generic solution for a pretty generic requirement. what do you wanna achieve with this?
Add a specific table in the service it fits the most
Sounds reasonable especially because you put it there where it belongs. Does it have to be a table? why not calculate or copy the values on runtime?
Use a key value store in backend (don't know about this, a coworker put it out there)
This is maybe a technological decision but first you need a design decision.

Data transition over multiple application versions

When upgrading a GAE application, what is the best way to upgrade the data model?
The version number of the application allows to separate multiple versions, but these application versions use the same data store (according to How to change application after deployed into Google App Engine?). So what happens when I upload a version of the application with a different data model (I'm thinking python here, but the question should also be valid for Java)? I guess it shouldn't be a problem if the changes add a nullable field and some new classes, so the existing model can be extended without harm. But what in case the data model changes are more profound? Do I actually lose the existing data if it becomes inconsistent with the new data model?
The only option I see for the moment are putting the data store into maintenance read-only mode, transforming the data offline and deploying the whole again.

There are few ways of dealing with that and they are not mutually exclusive:
Make a non-breaking changes to your datastore and work around the issues it creates. Inserting new fields into existing model classes, switching fields from required to optional, adding new models, etc. - these won't break compatibility with any existing entities. But since those entities do not magically change to conform to new model (remember, datastore is a schema-less DB), you might need a legacy code that will partially support the old model.
For example, if you have added a new field, you will want to access it via getattr(entity, "field_name", default_value) rather than entity.field_name so that it doesn't result in AttributeError for old entities.
Gradually convert the entities to new format. This is quite simple: if you find an entity that still uses the old model, make appropriate changes. In the example above, you would want to put the entity back with new field being added:
if not hasattr(entity, "field_name"):
entity.field_name = default_value
entity.put()
val = entity.field_name # no getattr'ing needed now
Ideally, all your entities will be eventually processed in such manner and you will be able to remove the converting code at some point. In reality, there will always be some leftovers which should be converted manually -- and this bring us to option number three...
Batch-convert your entities to new format. The complexity of logistics behind this depends greatly on the number of entities to process, your site's activity, resources you can devote to the process, etc. Just note that using straightforward MapReduce may not be the best idea - especially if you used the gradual convert technique described above. This is because MapReduce processes all entities of given kind (fetching them) while there may only be a tiny percentage needing that. Hence it could be beneficial to code the conversion code by hand, writing the query for old entities explicitly and e.g. using a library such as ndb.

Architectural Design for a Data-Driven Silverlight WP7 app

I have a Silverlight Windows Phone 7 app that pulls data from a public API. I find myself doing much of the same thing over and over again:
In the UI, set a loading message or loading progress bar in place of where the content is
Get the content, which may be already in memory, cached in isolated file storage, or require an HTTP request
If the content can not be acquired (no network connection, etc), display an error message
If the content is acquired, display it in the UI
Keep the content in main memory for subsequent queries
The content that is displayed to the user can be taken directly from a data source, such as an ObservableCollection, or it may be a query on a data source.
I would like to factor out this repetitive process into a framework where ideally only the following needs to be specified:
Where to display the content in the UI
The UI elements to show while loading, on failure, and on success
The URI of the HTTP request
How to parse the HTTP response into the data structure that will kept in memory
The location of the file in isolated storage, if it exists
How to parse the file contents into the data structure that will be kept in memory
It may sound like a lot, but two strings, three FrameworkElements, and two methods is less than the overhead that I currently have.
Also, this needs to work for however the data is maintained in memory, and needs to work for direct collections and queries on those collections.
My questions are:
Has something like this already been implemented?
Are my thoughts about the topic above fundamentally wrong in some way?
Here is a design I'm thinking of:
There are two components, a View and a Model.
The View is given the FrameworkElements for loading, failure, and success. It is also given a reference to the corresponding Model. The View is a UserControl that is placed somewhere in the UI.
The Model a class that is given the URI for the data, a method of how to parse the data, and optionally a filename and how to parse the file. It is responsible for retrieving the data and notifying the View whenever the current status (loading/fail/success) changes. If the data downloaded from the network is different from the cache, the network data takes precedence. When the app closes or is tombstoned, the model writes the data to the cache.
How does that sound?

I took some time to have a good read of your requirements and noted some thoughts to offer as a sounding board.
Firstly, for repetetive tasks with common behaviour this is definitely the way to approach it. You are not alone in thinking about this problem.
People doing a bunch of this sort of thing may have created similar abstractions however, to my knowledge none have been publicly released.
How far you go with it may depend if you intend it to be just for your own use and for those with very similar requirements or whether you want to handle more general cases and make a product that is usable by a very wide audience.
I'm going to assume the former, but that does not preclude the possibility of releasing it as an open source project that can be developed further and/or forked.
By not trying to cater for all possibilities you can make certain assumptions about the nature of the using implementation and in particular UI design choices.
I think overall your thinking in the right direction. While reading some of your high level thoughts I considered some things could be simplified (a good thing) and at the same time delivering a compeling UI.
On your initial points.
You could just assume a performance isindeterminate progressbar is being passed in.
Do this if it's important to you, but you could be buying yourself into some complexity here handling different caching requirements - variance in duration or dirty handling. Perhaps sufficient to lean on the platforms inbuilt caching of urls (which some people have found gets in their way).
Handle network connectivity, yep this is repetitive and somewhat intricate. A perfect candidate for a general solution.
Update UI... arguably better to just return data and defer decisions regarding presentation and format of data to your individual clients.
Content in main memory - see above on caching.
On your potential inputs.
Where to display content - see above re data and defer presentation choices to client.
I would go with a UI element for the progress indicator, again a performant progress bar. Regarding communication of failure I would consider implementing this in a Completed event which you publish. Then through parameters you can communicate the result and defer handling to the client to place that result in some presentation control/log/whatever. This is consistent with patterns used by the .Net Framework.
URI - yes, this gets passed in.
How to parse - passing in a delegate to convert a stream or string into an object whose type can be decided by the client makes sense.
Loc of cache - you could pass this if generalising this matters, or hardcode it's path. It would be more useful to others if passed in (consider if you handle folders/creation).
On the implementation.
You could go with a UserControl, if it works for you to be bound by that assumption. It would be more flexible though, and arguably equally simple/elegant, to push presentation back on the client for both the data display and status messages and control hide/display of the progress bar as passed in.
Perhaps you would go so far as to assume the status messages would always be displayed in a textblock (if passed) and shift that housekeeping from each of your clients into your generic class.
I suspect you will benefit from not coupling the data format and presentation still.
Tombstone handling.. I would recommend some testing on the platforms in built caching of URLs here and see if you can identify whether it's durations/dirty conditions work for your general cases.
Hopefully this gives you some things to think about and some reassurance you're heading down the right path. There are many ways you could go about this. Which is the best path ultimately will be driven by your goals.

I'm developing a WP7 application which is basically a client of an existing REST API. The server returns data in JSON. With the help of the library JSON.NET (http://json.codeplex.com/) I was able to deserialize it directly to my .NET C# classes.
I store locally the data to handle offline scenario of my application and also to prevent the call on the server each time the user launch the application. I provide two ways to refresh the data: manually and/or after a period of time. To store the data I use Sertling (http://sterling.codeplex.com/), it’s a simple but easy to use local database for Silverlight/WP7.
The biggest challenge is to handle the asynchronous communication with the server. I provide clear UI feedbacks (Progressbar and /or loading wheel) to let know to the user what’s going on.
On a side note I’m using MVVM Light toolkit and SL Unit Testing to do integration test View Model => my local Client code => Server. (http://code.google.com/p/nunit-silverlight/wiki/NunitTestsWp7)

Where to perform the data validation for a desktop application? On the database or in code?

In a single-user desktop application that uses a database for storage, is it necessary to perform the data validation on the database, or is it ok to do it in code? What are the best practices, and if there are none, what are the advantages and disadvantages of each of the two possibilities?

Best practice is both. The database should be responsible for ensuring its own state is valid, and the program should ensure that it doesn't pass rubbish to the database.
The disadvantage is that you have to write more code, and you have a marginal extra runtime overhead - neither of which are usually particularly good reasons not to do it.
The advantage is that the database ensures low-level validity, but the program can help the user to enter valid data much better than by just passing back errors from the database - it can intervene earlier and provide UI hints (e.g. colouring invalid text fields red until they have been completed correctly, etc)
-- edit (more info promoted from comments) --
The smart approach in many cases is to write a data driven validator at each end and use a shared data file (e.g. XML) to drive the validations. If the spec for a validation changes, you only need to edit the description file and both ends of the validation will be updated in sync. (no code change).

You do both.
The best practice for data validation is to sanitize your program's inputs to the database. However, this does not excuse the database of having its own validations. Programming your validations in your code only accounts for deltas produced in your managed environment. It does not account for corrupted databases, administration error, and the remote/future possibility that your database will be used by more than one application, in which case the application-level data validation logic should be duplicated in this new application.
Your database should have its own validation routines. You needn't think of them as cleaning the incoming data as much as it is running sanity checks/constraints/assertions. At no time should a database have invalid data in it. That's the entire point of integrity constraints.
To summarize, you do both of:
Sanitize and validate user inputs before they reach your data store.
Equip your data store with constraints that reinforce your validations.

You should always validate in the code before the data reaches the database.

Data lasts longer than applications. It hangs around for years and years. This is true even if your application doesn't handle data of interest to regulatory authorities or law enforcement agencies, but the range of data which interests those guys keeps increasing.
Also it is still more common for data to be shared between applications with an organisation (reporting, data warehouse, data hub, web services) or exchanged between organisations than it is for one application to share multiple databases. Such exchanges may involve other mechanisms for loading data as well as extracting data besides the front end application which notionally owns the schema.
So, if you only want to code your data validation rules once put them in the database. If you like belt'n'braces put them in the GUI as well.

Wouldn't it be smart to check the data before you try to store it? Database connections and resources are expensive. Try to make sure you have some sort of logic to validate the data before shipping it off to the database. I've seen some people do it on the front end, others on the back end, others even both.
It may be a good idea to create an assembly or validation tier. Validate the data and then ship it over to db.

In the application please!
Its very difficult to translate sqlerror -12345 into a message that means anything to an enduser. In many cases your user may be long gone by the time the database gets hold of the data (e.g. I hit submit then go look to see how many down votes I got in stackoverflow today).
The first prioirity is to validate the data in the application before sending it to the database.
The second priority should be to validate/screen the data at the front end to prevent the user entering invalid data or at least warn them immediatly that the data is inccorrect.
The third priority (if the application is important enough and your budget is big enough) would be for the database itself to verify the correctness of any inserts and updates via constriants and triggers etc.

Database Driven Front End Controller / Page Management Good or Bad?

I am currently working within a custom framework that uses the database to setup a Page Object which contains the information about Module, View, Controller, etc which is used by a Front Controller to handle routing and the like within an MVC (obviously) pattern.
The original reason for handling the pages within the database was because we needed to be able to create new landing pages on the fly from within a admin interface and because we also needed to create onLoad and onUnload events to which other dynamic objects could be attached.
However, after reading this post yesterday, it made me wonder if we shouldn't move this handling out of the database and make it all file structure and code driven like other frameworks so that pages can be tested without having the database be a component.
I am currently looking at whether to scrap the custom framework and go with one of the standard frameworks and extend it (which is what's most likely right now), but I'm wondering whether to extend the framework to handle page requests through database like we are now or if we should simply go with whatever routing / handling mechanism comes with the framework?

Usually I'm pretty lenient on what I will allow to go on in a "toy" application, but I think there are some bad habits that should be avoided no matter what. Databases are powerful tools, with reasonably powerful languages via stored procedures for doing whatever you need to have done... but they really should be used for storing and scaling access to data, and enforcing your low level data consistency rules.
Putting business logic in the data layer was common years ago, but separation of concerns really does help with the maintainability of an application over its lifespan.
Note that there is nothing wrong with using a database to store page templates instead of the file system. The line between the two will blur even further in the future and I have one system that all of the templates are in the database because of problems with the low budget hosting and how dynamically generated content need to be saved. As long as your framework can pull out a template as easily from a file or a field and process them, it isn't going to matter much either way.
On the other hand, the post from yesterday was about generating the UI layer elements directly from the database (at least, that's how I read it) and not an unusual storage location for templates. That is a deep concern for the reasons mentioned... the database becomes locked to web apps, and only web apps.
And on the third hand, never take other people's advice too much to heart if you have a system that works well and is easy to extend. Every use-case is slightly different. If your maintainability isn't suffering and it serves the business need, it is good enough.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight