We have come up with a stream strategy which has a main integration stream and several other streams for the other environments. A brief idea as shown below.
Integration Stream
-- Production
-- Development
---- Stream for release 1.0
---- Stream fix release 1.0
We intend to use the child streams for development and deliver it to the Development stream and choose the activities and create a baseline and promote them to the integration stream.
I read couple of highly informative posts on this forum and I have few doubts to begin with. I am new to the UCM environment and finding it tough to get the broader picture of the usage.
Should the baselines be created on the development branch or the integration branch.
How does the baseline goes through the life cycle?
Does the baseline we create, after testing, release etc only change in promotion levels?
It would be very helpful if anyone could describe this process.
Thanks in advance.
regards
1/ Should the baselines be created on the development branch or the integration branch.
You can create a baseline on any stream you want. They are just labels on the writable components.
Intermediate baselines can later be "obsoleted" (made locked and invisible) if you need to clean them up.
Your Development stream should be renamed into "Consolidation", because this is where you will consolidate what will actually go to Production (and delivered on "Integration Stream": your "Production" Stream is not needed here).
Since the fixes will begin from an "official" (i.e. "in production") label, I would recommend moving the stream "Stream fix release x.y" below the Integration Stream
Note: you need to be aware that a "deliver activities" creates a timeline linking all the activities from the source stream. That means you can deliver a partial set of activities from stream A to stream B, but you won't be able to deliver from stream A to stream C (unless you deliver all activities).
In short, baselining and delivering all activities is always simpler.
2/ How does the baseline goes through the life cycle
First, the status "TESTED", "VALIDATED", ... is just a meta-attribute you can set to whatever value you want, without any relation with the stream where the baseline has been set.
The life-cycle is then determined by:
the workflow of merges allowing you to isolate different development effort from one branch (build from the stream) to another.
the status (meta-data) you associate with your baseline.
Related
I am working on a distributed system (eCommerce) and using Kafka events for communication between systems. So according to our business logic, we are first publishing to Kafka topic (which is successful) and after that, we are updating the oracle database. Sometimes this update in database fails , how to maintain state consistency in between systems ? As the other system will update its database with new status of order but the producer will have the old status only, so how to reduce this inconsistency ?
What you are using actually calling dual writing which can cause perpetual inconsistency due to many ugly problems like partial failures like your case, race condition...
An approach I find quite interesting is using change data capture to keep different data system in sync which each other.
Highly recommend this talk, blog of Martin Kleppmann about how to use this kind of architecture:
Using logs to build a solid data infrastructure
Staying in Sync: from Transactions to Streams
I'm building a Flink Streaming system that can handle both live data and historical data. All data comes from the same source and then in split into historical and live. The live data gets timestamped and watermarked, while the historical data is received in-order. After the live stream is windowed, both streams are unioned and flow into the same processing pipeline.
I cannot find anywhere if all records in an EventTime streaming environment need to be timestamped, or if Flink can even handle this mix of live and historical data at the same time. Is this a feasible approach or will it create problems that I am too inexperienced to see? What will the impact be on the order of the data?
We have this setup to allow us to do partial-backfills. Each stream is keyed by an id, and we send in historical data to replace the observed data for one id while not affecting the live processing of other ids.
This is the job graph:
Generally speaking, the best approach is to have proper event-time timestamps on every event, and to use event-time everywhere. This has the advantage of being able to use the exact same code for both live data and historic data -- which is very valuable when the need arises to re-process historic data in order to fix bugs or upgrade your pipeline. With this in mind, it's typically possible to do backfill by simply running a second copy of the application -- one that's processing historic data rather than live data.
As for using a mix of historic and live data in the same application, and whether you need to have timestamps and watermarks for the historic events -- it depends on the details. For example, if you are going to connect the two streams, the watermarks (or lack of watermarks) on the historic stream will hold back the watermarks on the connected stream. This will matter if you try to use event-time timers (or windows, which depend on timers) on the connected stream.
I don't think you're going to run into problems, but if you do, a couple of ideas:
You could go ahead and assign timestamps on the historic stream, and write a custom periodic watermark generator that always returns Watermark.MAX_WATERMARK. That will effectively disable any effect the watermarks for the historic stream would have on the watermarking when it's connected to the live stream.
Or you could decouple the backfill operations, and do that in another application (by putting some sort of queuing in-between the two jobs, like Kafka or Kinesis).
Our windows cleacase is getting very slow and it takes more than one hour to get down to the user's stream and to create a view. In what way we can boost the speed of clearcase windows? we have tried to make the unused stream to "obsolete" but it didn't work so well.
Obsoleting streams won't change anything: ClearCase would still have to manage the same number of UCM object (it might only increase the display of projects in the ClearCase project explorer).
Various reasons can explain performance issues on Windows/ClearCase, depending on the version of Windows and of ClearCase:
registry issue (MiniFilterMask key)
license caching
Network provided order
One way of detecting performance issue is by using the Rational ClearCase Reports (Windows only), which can help keep track of the performance of commands/scripts you would execute on a regular basis.
You can also mitigate the performance issue by setting up another Vob server/View server (with its own registry server) in order to register a subset of the same vobs, and see if the speed improves then: make sure those servers are on the same box than the vobs they are referring to, in order to register only vobs with a local path (and not a network parh, which can be slower to access).
In UCM ,some times we may need to do activities for experimental purpose .
It may or may not be included as part of final delivery.
If we do not want to deliver we will not deliver it to integration stream.
But the problem here is dependency , some times the experimenting activity make dependency with other activity and we are forced to deliver it.
Is there any way to safely do experimentation without any side effect?
Is it possible to delete the activity and their corresponding changeset as if it was not added in clearcase itself?
The safest way is to isolate that experimentation in its own UCM Stream.
Because if you don't, you may be able to do partial delivers from a while, before being forced to deliver all your activities: see "Clearcase UCM - Cross delivering vs. delivering upwards?".
The other dependency issue is file-based (when your activities to be delivered are based on versions created in activities for experiment). That is another argument for isolating said experiment in its own Stream.
And that would make deleting an activity quite dangerous.
Deleting an activity is only possible if it is empty, meaning if you have moved all the versions in another activity (which solves nothing), or if you have rmver them.
And you should avoid deleting a version (too dangerous in ClearCase UCM).
With a dedicated Stream, you are sure to deliver all activities, or to deliver none.
The subtractive merge mentioned by Tamir certainly isn't a solution, especially when you have many activities to cancel (ie when you have been forced to deliver many activities).
You do have a script to cancel an activity (see "Reverse Changset of an activity in Clearcase"), but that will pollute your history with many additional versions.
Besides, you can do subtractive merge. However it's quite dangerous and you should do it very carefully. You can find more info here:
http://www-01.ibm.com/support/docview.wss?uid=swg21123001
Currently we are evaluating several key+value data stores, to replace an older isam currently in use by owr main application (for 20 something years!) ...
The problem is that our current isam doesn't support crash recoveries.
So LevelDB seemd Ok to us (also checking BerkleyDB, etc)
But we ran into de question of hot-backups, and, given the fact that LevelDB is a library, and not a server, it is odd to ask for 'hot backup', as it would intuitively imply an external backup process.
Perhaps someone would like to propose options (or known solutions) ?
For example:
- Hot backup through an inner thread of the main applicacion ?
- Hot backup by merely copying the LevelDB data directory ?
Thanks in advance
You can do a snapshot iteration through a LevelDB, which is probably the best way to make a hot copy (don't forget to close the iterator).
To backup a LevelDB via the filesystem I have previously used a script that creates hard links to all the .sst files (which are immutable once written), and normal copies of the log (and MANIFEST, CURRENT etc) files, into a backup directory on the same partition. This is fast since the log files are small compared to the .sst files.
The DB must be closed (by the application) while the backup runs, but the time taken will obviously be much less than the time taken to copy the entire DB to a different partition, or upload to S3 etc. This can be done once the DB is re-opened by the application.
LMDB is an embedded key value store, but unlike LevelDB it supports multi-process concurrency so you can use an external backup process. The mdb_copy utility will make an atomic hot backup of a database, your app doesn't need to stop or do anything special while the backup runs. http://symas.com/mdb/
I am coming a bit late to this question, but there are forks of LevelDB that offer good live backup capability, such as HyperLevelDB and RocksDB. Both of these are available as npm modules, i.e. level-hyper and level-rocksdb. For more discussion, see How to backup RocksDB? and HyperDex Question.