How is singleton code+data handled in scale-out architectures? - database

This is more of a conceptual question but answers specific to opensource products like (JBoss, etc) are also welcome.
If my enterprise app needs to scale and I want to choose the scale-out model (instead of the scale-up) model, how would the multiple app server instances preserve the singleton semantics of a piece of code/data?
Example: Let's say, I have an ID-generation class whose logic demands that it be instantiated as a singleton. This class may or may not talk to the underlying database. Now, how would I ensure that the singleton semantics of this class are maintained when scaling out?
Secondly, is there a book or an online resource that both lists such conceptual issues and suggests solutions?
EDIT: In general, how would one handle generic, application state in the app server layer to allow the application to scale out? What design patterns, software components/products, etc I should be exploring further?

The further you scale out, the less able you are going to be to manage global static atomically. In other words, if you have 100 servers that need to share state (knowing which ID is next in an ID generating singleton class), then there is no technology I know of that will quickly and atomically get that ID for you.
Data has to travel from machine to machine in regards to the ID generation.
There are a few options I can think of for the scenario you mentioned:
Wait for all machines to catch up/sync before accepting a new ID. You could generate the ID locally and then check that it's good across other machines - or - run a job to get the next ID across all machines (think map/reduce).
Think sharding. With sharding you can generate IDs "locally" and be guaranteed to have uniqueness. So if you had 100 machines, machines 1-10 are for users in California, machines 11-20 are for users in New York, etc. Picking a sharding key can be tough.
Start looking to messaging systems. You would create/modify your object locally on a machine and then send the result to a service bus/messaging system and the other machines subscribe to a topic/queue and can get the object and process it.
Pick a horizontally scalable database to manage objects. They've already solved the issues of syncing and replication.

Related

Anylogic 7. Databases modelling

I am looking for the best way to model a database system.
It should be made of nodes, edges and data query flows.
I know there is a flow lib, but i dont sure that it is usable for such things.
So, the question is: is there any libs that i could use for this purpose? Or i should mostly use my own types, agents etc.?
The fuild library (if you meant that) is not useful for that purpose.
If you want to model the flow of data through a system of nodes, you might want to start with a simple process-modelling approach where data items are agents flowing through queues, delays and service objects...
However, depending on what your database system is doing (I am no expert there), you might actually need to switch to a pure agent-based approach sooner or later (i.e. replace the process library objects with your own functionality).
In short: start with process modelling and introduce agent-based functionality over time...
If you are new to AnyLogic I suggest you follow the logic in the tutorial for agent based modeling. Look at it as if the distributor is your server, the retailers your clients and the orders your queries. You can use GIS maps if you are concerned about the real location of servers and clients or use other network capabilities (or agent connections) if the actual locations are not important in your model.

How to structure/coordinate multiple databases?

Imagine a large corp with dozens of companies, each with their own website and each website will have their own unique functional requirements
Most data on each website will be specific to that website
Each website can edit its own data
Some data will be shared across all websites
There will be a central CMS that is allowed to edit this data, but other websites can read and use that data
e.g. say you're planning the infrastructure for a company that owns multiple sub-companies that make different kinds of products, some in the same category (cereal, food), others in completely different categories (books, instruments). Some are marketing websites, some are for CRM, some are online stores
there are a list of regulatory requirements that affect all products
each company should manage the status of compliance of its own products to each requirement
when a new requirement surfaces, details regarding that requirement should only be entered once
How would the multiple databases be coordinated?
edit: added more info per Bob's suggestions
Thanks for the incredibly insightful questions!
compliance data is not shared, silo'd within each site
shared data is only on the one enterprise-wide database, they will mostly be "types of [thing]"
no conclusive list of instances where they'll be used but currently it'd be to populate CMS dropdowns for individual sites.
changes to shared data would occur a few times a year.
Ideally changes would be reflected within a few minutes, but an hour or so should be acceptable
very low volume in shared data.
All DBs will be new, decision on which DB is pending current investigation.
Sub-systems will expose REST api
Here are some ways I have seen this handled, you need to think about the implications of each structure based on the details of your particular business domain. All can work, but all have to be carefully set up if they are going to work.
One database for shared information and one for each client for client-specific information. Set up the overall application so that the first thing you put in the application on log in is the client and it connects to the correct client. People might have to also have a way to change the client if users will handled multiples.
Separate servers for each client if they completely need to be siloed. Database changes are by script (and in source control) and are applied to each server as need be. So the changes to the central database might have a job that runs to push any data changes to the other servers
All the data in one database, but making sure each table has a client_id so that the data is always filtered correctly by client. You can set up separate views by client, so that the users can only see the clients they are supposed to see. This only works if the data for each client is substantially in the same form.
And since you are in a regulatory environment, I strongly urge that you create an audit database that is updated by database triggers (never audit from the application, you will lose changes to the data) for each database.
I agree with Chris that, even after both the sets of questions, there is still a big set of possible solutions. For instance, if the databases were the same technology, and the shared data were stored in the same way in each one, you could do db-level replication from the central db to the others. Is it OK to have 2 separate dbs per application (one with shared stuff and one with not-shared?) - this would influence the kind of replication.
Or you could have a purely code solution, where clicking publish in a GUI that updates the central db calls a set of APIs that also update the other dbs. Or micro-services - updating the central db also creates a message on a shared queue, that is picked up by services that each look after a different db and apply the updates in whatever form makes sense for that db.
It depends on (among the things already mentioned) what your organisation's technology strategy is, what technology and skills you already have in-house, and so on.
So this is as much an architecture question as it is a db question.
I don't think this question is sufficiently clear to get a single answer. However there are a few possibilities.
In many cases, where you have shared data you want to have a single point of ownership of that information. It could be in a database, in an excel file (which can then be turned into csv and periodically loaded on all dbs), or some other form. The specifics depend on what is shared exactly.
Now in this case it sounds like you are going to have some sort of legal department in charge of some shared information and they will manage that data, which will then be shared to the other sites. This might be done with an application they manage which aggregates information from the other companies or it could be data which is pushed to their systems.
A final point:
Software is at its best when it facilitates human solutions to human problems, not when it tries to solve those problems directly. In these cases, you probably want a good human solution in place and then to look at what software can do to support that. A lot of the issues (who owns the information?) will already have been solved and you will be simply automating what is already done.

Where should i access my Database

I'm curious how you would handle following Database access.
Let's suggest you have a Computer which Hosts your database as part of his server work and multiple client PC's which has some client-side-software on it that need to get information from this database
AFAIK there are 2 way's to do this
each client-side-software connects directly to the Database
each client-side-software connects to a server-side-software which connects to the Database as some sort of data access layer.
so what i like to know is:
What are the pro and contra's of each solution?
And are other solutions out there which maybe "better" to do this work
I would DEFINITELY go with suggestion number 2. No client application should talk to a datastore without a broker ie:
ClientApp -> WebApi -> DatabaseBroker.class -> MySQL
This is the sound way to do it as you separate concerns and define an organized throughput to the datastore.
Some benefits are:
decouple the client from the database
you can centralize all upgrades, additions and operability in one location (DatabaseBroker.class) for all clients
it's very scaleable
its safe in regards to business logic
Think of it like this with this laymans example:
Marines are not allowed to bring their own weapons to battle (client apps talking directly to DB). instead they checkout the weapon from the armory (API). The armory has control over all weapons, repairs and upgrades (data from database) and determines who gets what.
What you have described sounds like two different kind of multitier architectures.
The first point matches with a two-tier and the second one could be a three-tier.
AFAIK there are 2 way's to do this
You can divide your application in several physical tiers, therefore, you will find more cases suitable to this architecture (n-tier) than the described above.
What are the pro and contra's of each solution?
Usually the motivation for splitting your application in tiers is to achieve some kind of non-functional requirements (maintainability, availability, security, etc.), the problem is that when you add extra tiers you also add complexity,e.g.: your app components need to communicate with each other and this is more difficult when they are distributed among several machines.
And are other solutions out there which maybe "better" to do this work.
I'm not sure what you mean with "work" here, but notice that you don't need to add extra tiers to access a database. If you have a desktop application installed in a few machines a classical client/server (two-tier) model should be enough. However, a web-based application needs an extra tier for interacting with the browser. In this case the database access is not the motivation for adding this extra tier.

Application database/instance decomposintion

I'm designing a service that will serve some business entites. Logically it will be divided into two parts:
Frontend - bells and whistels like Wiki, Pricing, Landing Page, maybe account information (billing, account status, and so on).
Service itself, where business entity's empoyers will do theirs work.
It is play 2.x framework, planning to host in heroku.
It is not clear for now how to decompose intstances and DB stuff.
Should I decompose DB for clients: business entity - one database? Or should I store all data in one database, but add for all tables id of business entity that ownes some row? What issues (performance, administrative, scaling) may come up with this decision?
If I will choose to divide databases, how can I do this? For that I need to launch app instance with DB for client that instance belongs to. Thus we have non-uniform instances that can be obstacle for scaling. And as I know, heroku doesn't support non-uniform (web)instances.
Please help, i'm totally stucked here.
Expected stack:
Scala
Play 2.0
Anorm
JDBC
PostgresSQL
Heroku
All (except Scala, and may be Play 2.0) of this are interchangeable.
This is a pretty classic problem. You have many clients and you wonder if you should create separate databases for each client - or if they should share a database.
I would recommend starting with one shared database and then use that until you out grow it. Think of some of the disadvantages to having each client with their own database instance:
Like you mention the schema management can be tough. You'd need to write tools to maintain all databases across all servers.
If you tell clients you have structured your system this way, some of them might push you to fork the database. In other words they might argue, "I have my own database! I want a new table just for me."
It's a bit harder to run queries across servers/databases. If you wanted to count how many items all clients have, you'd have to think about that a bit.
But if you want to start by sharding based on client (http://en.wikipedia.org/wiki/Shard_(database_architecture)), you might consider:
As mentioned previously, you'll need some tools/scripts to launch a new database instance for a client. Often those tools will need to "seed" the database with configuration information - like populating a states table for addresses.
You'll want to have a tool to monitor/maintain the databases. Start one, stop another, see if one has high CPU usage etc.
You'll need some kind of system to aggregate statistics across all clients.
You'll need a tool to roll out schema changes and a plan on how you can gracefully upgrade the database while their web application is running.
Overall I would advise to start small and simple and only start worrying about scale when you get there.

Any recomendations for an efective way to sync data from one database, to other app's databases?

Here's my problem. I built a web app, and naturally kept the data in a database which describes that app's domain. Afterwords, I built another web app for the same organization, and used a seperate database to describe that app's domain and store data... and naturally a couple more projects came up and for each app I've isolated it's data to a single database. Deveolpment wise, I think it's ok, as I can maintain changes to the data structure and data at the app's database.
Considering these apps belong to the same organization, there tends to be plenty of data replicated between them, like department names, job titles, shop names, etc. Most of these tables hold the same data, but are not exactly the same in each database, and are not always used by all of the apps. Changes to this data, though, needs to be changed at all the apps (sometimes in a diferent ways) creating a growing management "hassle".
So I've been think of a way to get some syncronization between the data. I want an easier management - update at one app (or a central app) and update all the databases as needed by each app - and also a better way to share data between apps (like maybe mash up data from differnt apps in a new app to alow specific analysis). Most of the data I'm refering to is used as contraints more than being core domain concept, describing the organization rather than describing a particular domain.
I'm looking for opinions on some ways to get this done.
My first idea was to grab comun data structures, like the department names' table i mentioned, and stick'em in a core database. Any updates to the data would be done at this database, through a dedicated web app, and I'd apply some sort of Observer or Publisher / Suscriber Pattern for these changes - on changes the app would notify observing apps (through there dedicated webservice) that the changes occured and allow for the app to grab the new data and use it as it needs. GUIDs could be user as a reference to identify the same data throughout the apps. Also, I could build web services for read and search operations that don't need to be in a specific app's database, but could be useful to it.
A second idea would be that each app manage it's own data, and the apps could observe one another. A change in one could notify others that share the same data structure that the change occurred. I could still use some GUIDs and even build services on any of the apps. I think this would also be less excessive in terms of duplication of data, but might be harder to manage as each app would eventually be coupled to other apps, and I would some how have to distribute responsabilities as to which app controls what information.
I'm really curious as to something of this genre of data distibuition and syncing would work and even be recomended. Opions and other ideas are more than welcome!
What you describe here is a typical case for a "Master Data Management" system. EAI vendors (Oracle, TIBCO, IBM) offer such products. They resemble your first solution, being centralised databases with synchronization processes, detecting changes in external data sources, grabbing the changes and synchronizing data out to other external databases. They also provide a user interface to change master data directly.
MDM software are expensive, but you can implement a custom solution which will be - at least initially - cheaper than purchasing one. Both of your solutions make technical sense but there is a difference in their manageability.
The first one is better, if you can dedicate a responsible person/organization to take care of it and the business owners of your services can agree on making changes via this new centralised system.
The second solution shares the responsibility between the service owners. The hard task here is to identify the owner of each type of information (business object).
I cannot advise a solution without a deeper knowledge of your systems and organizations, but I hope I could give some ideas.

Resources