How do you manage static data for microservices? - database

For a database-per-service architecture, how do you guys manage your static data for each microservice? I want to make it easy for a new developer to jump in and get everything up and running easily on their local machine. I'm thinking of checking the entire database with static data into source control with Docker bind mounts so people can just docker-compose up the database service locally (along with whatever other infrastructure services they might need to run and test their microservice).
I know each microservice might need to handle this in their own way, but I'd like to provide a good default template for people to start with.

Making a standard for how to do this sort of goes against the reason for making microservices, i.e. that you can adapt each microservice to the context it exists in.
That being said, Postgres, Mongo and MySQL all run scripts in /docker-entrypoint-initdb.d when initializing a fresh database instance. The scripts have to fit the database obviously, but it's a fairly standardized way of doing it.
They all have descriptions of how to do it on the image page on docker hub.
You can either get your scripts into the container by making a custom image that contains the scripts or you can map them into the directory using a docker-compose volume mapping.
There are some databases that don't have an easy way to initialize a new database. MSSQL comes to mind. In that case, you might have to handle it programmatically.

Related

airflow initdb in directory other than AIRFLOW_HOME?

Question for Apache Airflow / Docker users. I have a Docker airflow image I've built and I'm trying to use a simple SequentialExecutor / sqlite metadata database, but I'd like to persist the metadata database every time a new container is run. I'd like to do this by mounting to a drive on the local machine, and having it so initdb initializes the database somewhere other than AIRFLOW_HOME. Is this possible / configurable somehow or does anyone have a better solution?
Basically the desired state is:
AIRFLOW_HOME: contains airflow.cfg, dags, scripts, logs whatever
some_other_dir: airflow.db
I know this is possible with logs, so why not the database?
Thanks!
I think the best option is to use docker-compose with a container as metadata database, like this:
https://github.com/apache/airflow/issues/8605#issuecomment-623182960
I use this approach, along with git branches and it works very well. The data persists unless you explicitly remove the containers with make rm

Docker Like DB Deployment

I've just finished setting a dev environment where every developers /feature/, /bugfix/ and /hotfix/* git branches are automatically built and deployed to a freshly provisioned Windows Container which hosts the webapp and services creating a test environment, for each branch to be validated before merging into master.
While this is working quite nicely, I've still only got 1 dev db per developer which is used by all of their branches.
In an ideal world, I would like each of these test containers to use their own isolated db instance, however the db is currently at about 50gb at the smallest i can get it without going and tearing out historical data which is sometime useful.
What I would really like to do, is create a docker like image for this db and then spawn a new "container" from this image which only keeps track of the diff between it's changes an the original without ever altering the original db.
Is something like this even possible or does anyone have any ideas how I might achieve this db isolation, per container without having to create a full 50gb db for each?
Ok, So after much flailing around in the dark, I think I've finally come up with a solution. #ErikEJ, thanks, you started me off in the right direction. After looking into DB Snapshots on MSSQL, I found the only way to do writable snapshots seemed to be using vss and actually creating writable disk snapshots. This too led me down a long path of at first trying locally and failing, then trying to implement iscsi and still getting nowhere. Then I stumbled upon hyper-v snapshots and had a look at what was happening under the hood there and finally came across creating differencing VHD's. So basically my solution is as below.
Create VHD with a sanitised copy of my production DB mdf and ldf file in.
Then I create differencing vhds for each environment I need, mount the differencing vhds each in their own folder eg c:\db\Issue-1234 and create a new db DB_ISSUE-1234 by atatching to the files in these folders. Only the diffs are then stored and instead of having several 50-60gig copies of the db. I only have 1 and then the differencing vhds only store the difference.
I've just got this working with 2 or 3, so not sure how robust it is and how fast these differencing vhds are going to grow, but looking very promising so far and is allowing me to spin up multiple environments for testing purposes, extremely quickly (in fact all automated by scripts in deployment).
Hope this possibly helps someone else save some time one day and please let me know if anyone has figured out a more efficient/quicker/better way to do this :)

EpiServer CMS - get all changed properties

I have an EPIServer CMS. I have a staging instance and a production instance. I want to be able to edit properties/texts in the staging instance, and then in one operation migrate all the new values to production. What is the easiest way to do this?
I suppose I should do something like programatically enumerate all changed properties since a given timestamp and then save key/values to a file, and then update in production from the file. Os is there a better way built-in to achieve the same?
Not built in. If your stage db is a copy of production when you start you can export the pages from stage and include page types and then import them to production, but they will get new ids and you'd have to delete the originals. You would also lose all updates made to production during development. I think you're better of writing that xml exporter/importer.
Looks like Episerver Mirroring will solve your problem. You can use mirroring to move content from staging to prod with the help of scheduled job or by running the job manually.

How to determine at runtime if I am connected to production database?

OK, so I did the dumb thing and released production code (C#, VS2010) that targeted our development database (SQL Server 2008 R2). Luckily we are not using the production database yet so I didn't have the pain of trying to recover and synchronize everything...
But, I want to prevent this from happening again when it could be much more painful. My idea is to add a table I can query at startup and determine what database I am connected to by the value returned. Production would return "PROD" and dev and test would return other values, for example.
If it makes any difference, the application talks to a WCF service to access the database so I have endpoints in the config file, not actual connection strings.
Does this make sense? How have others addressed this problem?
Thanks,
Dave
The easiest way to solve this is to not have access to production accounts. Those are stored in the Machine.config file for our .net applications. In non-.net applications this is easily duplicated, by having a config file in a common location, or (dare I say) a registry entry which holds the account information.
Most of our servers are accessed through aliases too, so no one really needs to change the connection string from environment to environment. Just grab the user from the config and the server alias in the hosts file points you to the correct server. This also removes the headache from us having to update all our config files when we switch db instances (change hardware etc.)
So even with the click once deployment and the end points. You can publish the a new endpoint URI in a machine config on the end users desktop (I'm assuming this is an internal application), and then reference that in the code.
If you absolutely can't do this, as this might be a lot of work (last place I worked had 2000 call center people, so this push was a lot more difficult, but still possible). You can always have an automated build server setup which modifies the app.config file for you as a last step of building the application for you. You then ALWAYS publish the compiled code from the automated build server. Never have the change in the app.config for something like this be a manual step in the developer's process. This will always lead to problems at some point.
Now if none of this works, your final option (done this one too), which I hated, but it worked is to look up the value off of a mapped drive. Essentially, everyone in the company has a mapped drive to say R:. This is where you have your production configuration files etc. The prod account people map to one drive location with the production values, and the devs etc. map to another with the development values. I hate this option compared to the others, but it works, and it can save you in a pinch with others become tedious and difficult (due to say office politics, setting up a build server etc.).
I'm assuming your production server has a different name than your development server, so you could simply SELECT ##SERVERNAME AS ServerName.
Not sure if this answer helps you in a assumed .net environment, but within a *nix/PHP environment, this is how I handle the same situation.
OK, so I did the dumb thing and released production code
There are a times where some app behavior is environment dependent, as you eluded to. In order to provide this ability to check between development and production environments I added the following line to global /etc/profile/profile.d/custom.sh config (CentOS):
SERVICE_ENV=dev
And in code I have a wrapper method which will grab an environment variable based on name and localize it's value making it accessible to my application code. Below is a snippet demonstrating how to check the current environment and react accordingly (in PHP):
public function __call($method, $params)
{
// Reduce chatter on production envs
// Only display debug messages if override told us to
if (($method === 'debug') &&
(CoreLib_Api_Environment_Package::getValue(CoreLib_Api_Environment::VAR_LABEL_SERVICE) === CoreLib_Api_Environment::PROD) &&
(!in_array(CoreLib_Api_Log::DEBUG_ON_PROD_OVERRIDE, $params))) {
return;
}
}
Remember, you don't want to pepper your application logic with environment checks, save for a few extreme use cases as demonstrated with snippet. Rather you should be controlling access to your production databases using DNS. For example, within your development environment the following db hostname mydatabase-db would resolve to a local server instead of your actual production server. And when you push your code to the production environment, your DNS will correctly resolve the hostname, so your code should "just work" without any environment checks.
After hours of wading through textbooks and tutorials on MSBuild and app.config manipulation, I stumbled across something called SlowCheetah - XML Transforms http://visualstudiogallery.msdn.microsoft.com/69023d00-a4f9-4a34-a6cd-7e854ba318b5 that did what I needed it to do in less than hour after first stumbling across it. Definitely recommended! From the article:
This package enables you to transform your app.config or any other XML file based on the build configuration. It also adds additional tooling to help you create XML transforms.
This package is created by Sayed Ibrahim Hashimi, Chuck England and Bill Heibert, the same Hashimi who authored THE book on MSBuild. If you're looking for a simple ubiquitous way to transform your app.config, web.config or any other XML fie based on the build configuration, look no further -- this VS package will do the job.
Yeah I know I answered my own question but I already gave points to the answer that eventually pointed me to the real answer. Now I need to go back and edit the question based on my new understanding of the problem...
Dave
I' assuming yout production serveur has a different ip address. You can simply use
SELECT CONNECTIONPROPERTY('local_net_address') AS local_net_address

SQLite vs.SQLCE Deployment

I am in the process of writing an offline-capable smartclient that will have syncing capability back to the main backend when a connection can be made. As a side note, I considered the Microsoft Sync Framework but since I'm really only going one-way I didn't feel it would buy me enough to justify it.
The question I have is related to SQLite vs. SQLCE and ClickOnce deployments. I've dealt with SQLite before (impressive little tool) and I've dealt with ClickOnce, but never together. If I setup an installer for my app via ClickOnce, how do I ensure during upgrades the local database doesn't get wiped out? Is it possible to upgrade the database (table structure, etc. if necessary) as part of the installer? Or is it better to use SQLCE for something like this? I definitely don't want to go the route of installing SQL Express or anything as the overhead would be far too high for what I am doing.
I can't speak about SQLLite, having never deployed it, but I do have some info about SQLCE.
First, you don't have to deploy it as a prerequisite. You can just include the dll's in your project. You can check this article which explains how. This gives you finite control over what version is being used, and you don't have to deal with installing it per se.
Second, I don't recommend that you deploy the database as a data file and let ClickOnce manage it. When you change that file, ClickOnce will publish it again and put it in the data directory. Then it will take the previous one and put it in the \pre subfolder, and if you have no code to handle that, your user will lose his data. So if you open the database file to look at the table structure, you might be unpleasantly surprised to get a phone call from your user about the data being gone.
If you need to retain the data between updates, I recommend you move the database to the [LocalApplicationData] folder the first time the application runs, and reference it there. Then if you need to do any updates to the structure, you can do them programmatically and control when they happen. This article explains how to do this and why.
The other advantage to putting the data in LocalApplicationData is that if the user has a problem and has to uninstall and reinstall the application, his data is retained.
Regardless of the embedded database you choose your database file (.sqlite or .sdf) will be a part of your project so you will be able to use "Build Action" and "Copy to Output Directory" properties of that file to control what happens with the file during the install/update.
If you choose "Do not copy" it will not copy the database file and if you choose "Copy if newer" it will only copy if you have a new version of your database file.
You will need to experiment a little but by using these two properties you can have full control of how and when your database file is deployed/updated...

Resources