Implementation Strategy Pattern based on Other Strategy - strategy-pattern

I am trying to design an application which has have two work features,
1. Job Work
2. Incident Work.
Further more, Job work is divided in 5 different processor (JobWork1 to JobWork5) based on Job Type and Incident Work is divided into 7 different processor (IncidentWork 1 to IncidentWork7) based on Incident type.
Please help me to understand how can I combine two strategy here.

Related

CRISP-DM - Timing for each of the tasks?

I have what may be a simple question.
So, using CRISP-DM we have 6 tasks which have to be followed.
How to identify the amount of time needed for each of the tasks?
P.S. As assumption, for Data Collection we need 3 days.
This is the question, how it's looks like.
There is no general rule.
Every project is very different.
For example, one project may already have all its data, and thus need 0 days to get the data.
Usually, there will be some manager preventing access to the data you need, and then it will take at least 6 months and C-level activity to get the data to you. And absolutely no progress will be possible before seeing the data.
So just plan 0-12 months on every step.
Also, don't forget that it is an iterative process, so you will need to restart again, anyway. In my opinion, CRISP-DM is dead. Business people love it because it gives them the impression of "managing" things, but it doesn't work that way in reality, it is just theater you do for the managers.

Database Synchronization Algorithm Advice

I'm working on an application that needs an algorithm for data synchronization to be implemented.
We'd be having a main server , and multiple subordinate devices , which would need to be synced together.
Now , I have three algorithms and I'd like advice on which one would be the best according to any of you.I'd really really appreciate your opinions.
1. A description of the algorithm can be found here.Its a scientific research paper by Sang-Wook Kim
Division of Information and Communications
Hanyang University, Korea
http://goo.gl/yFCHG
2 This algorithm involves maintaing a record of timestamps and version numbers of databases
If for instance , one has version v10 , on one’s mobile device and the server , has v12 , the mobile, assuming that the current timestamp on the mobile device is less recent as compared to the timestamp on the server,
If we denote a deletion by - , an insertion by a + and a change by ~
And the following change logs are associated with a few versions :
v11: +r(44) , ~r(45),-r(46)
v12: -r(44),~r(45),+r(47)
Then the overall change in the database is , ~r(45) ( from v12),+r(47),-r(46)
Hence it can be seen that the record r(44) , wasn’t needed ,even though it was added, and then subsequently deleted. Hence no redundant data needs to be transferred.
The whole algorithm can be found here ( I've put it up in a pdf ) http://goo.gl/yPC7A
3 This algorithm in effect - keeps a table that records the last change timestamp for each record.And keeps rows sorted according to timestamp.It synchronizes only those rows that have been changed ,the only con i see here is sorting the table each time according to timestamps .
Here's a link http://goo.gl/8enHO
Thanks a ton for your opinions ! :D
I have not been involved in this directly myself, but I have been around when people were working on this sort of thing. Their design was driven not by algorithm analysis or a search for performance, but by hours spent talking to representatives of the end users about what to do when conflicting update requests were received. You might want to work through some use cases with users. It is even possible that users will want different sorts of conflict resolution for different sorts of data in different places.
All the designs here save bandwidth by propagating changes. If something ever causes one side to stop being an exact copy of the other, this inconsistency can persist indefinitely. You could at least detect such a problem by exchanging checksums (SHA-2 or SHA-3 if you are worried enough). One idea is to ask the recipient system for a checksum and then select an package of updates based on that checksum.

One or two field to represent "current step" and "is or not finished" state?

Sorry if this question is too silly or neurotic... But I can't figure it out by myself. So I want to see how others deal with it.
My question is:
I want to write a program show progress of do some thing. So I need to record which state is it currently in so that someone can check it by anytime. there are two method:
Use two field to represent the progress state: step and is_finished.
Just one filed: step. For example, if this thing need 5 step, then 6 means finished. ( 0 means not started? )
Compare above two methods.
two field:
Seems more clear. And the most important is that logically speaking step and finished or not are two concepts? I'm not sure about this.
If thing are finished. We change is_finished field to true ( or 1 as you like ). But what to do with step field now? Plus one, or just not touch it because it has no meaning any more now?
one field:
Simple, space saving. But not very intuitive. For example, we don't know what 6 really means by just looking at this field because it may represent finish or middle step. It need other information e.g. total step to determine. And potentially this meaning is not very stable if the total steps will change ( is_finished field in two field method would not affected by this).
So How do you will deal with it? Thanks!
UPDATE:
I forgot some point maybe useful in the previous post:
The story is: We provide a web-based service for customers. (This service has time limitation e.g. 1 year term) After customer purchase it our deployment programe prepare hardware(virtual machine) and deploy some software which need some time to finish. And we want to provide progress info for customer. When deployment is finished, the customer should be informed.
Database design:
It need a usage state field to represent running normal, running but owe (expired), stop. What confusing me is should it include not deployed yet and deploying information or not?
The progress info should include some other info e.g. the start time so we can tell how much time elapsed since start. But this info is no need to be persistent because we won't care about these info as long as it's finished. So I decide to store these progress info in a separate (temporary) table. Then I think it need another field in another more persistent table to tell if things are done . So can we combine it into the usage state field mentioned above?
I like the one-field approach better, for the following reasons:
(Assuming you want to search on steps) you can "cover" all steps using only one simple index.
Should you ever want to attach some additional information to each of the steps, the one-field approach can easily accommodate a FOREIGN KEY towards a new table containing that information.
Requires slightly less storage space. Storage is cheap these days, but that's not the point - caching and network performance is.
Two-field approach:
(Assuming you want to search on steps) might require a "fatter" composite index or even two indexes (which takes space, lowers the cache effectiveness and incurs maintenance cost for INSERT/UPDATE/DELETE operations).
Requires a CHECK to defend the database from "impossible" combinations. Funny enough, some DBMSes don't enforce CHECKs (I'm looking at you, MySQL).
Requires slightly more storage space (and therefore slightly less of it fits into cache, takes up slightly more network bandwidth etc.).
NOTE: Should you choose to use NULLs, that could have "interesting" consequences under certain DBMSes (for example, Oracle doesn't index NULLs).
For example, we don't know what 6 really means
That doesn't really matter, as long as the client application knows what it means.
Design the database for applications, not humans.
And potentially this meaning is not very stable if the total steps will change
True, but you have the same problem with two-field approach as well, if new step is added in the "middle" of existing steps.
Either UPDATE the table accordingly,
or never change the step values. For example, if the step 5 is the last one, then newly added step 6 is considered earlier despite having greater value - your application (or the additional table I mentioned) will know the order of steps, even if their values are not ordered. If you really want "order by value" without resorting to UPDATE, make the steps: 10, 20, 30 etc, so you can insert new steps in the gaps (the old BASIC line number trick).
It remains a matter of taste but I would suggest the second option of a single int field step. On inserting a new record, initialize the value of step to 0 which would indicate "not started yet". Any positive integer value would obviously denote the current step. As soon as the trajectory is completed I would set step to NULL. As you correctly stated this method does require solid documentation but I think that it is not too confusing

In what sequence cluster analysis is done?

First find the minimum frequent patterns from the database.
Then divide them into various data types like interval based , binary ,ordinal variables etc and define various distance measures for all the variables.
Finally apply cluster analysis method.
Is this sequence right or am i missing something?
whether you're right or not depends on what you want to do. The general approach that you describe seems to go into the right direction, but you'll never know if your on target until you answer the following questions:
What is your data?
What are trying to find/Which cluster method do you want to use?
From what you describe it seems to me that you want to do 'preprocessing' steps like feature selection and vectorization. Unfortunately, this by itself can be quite challenging. For example, one of the biggest partical problems is the design of a distance function (there's a tremendous amount of research available).
So, please give us more information on your specific target application.

Database design & table relationships: where would the data go? (Image included)

I need to carry out a data capture exercise, which is looking like a large task, that unfortunately may end up being done in Excel. I believe a database is more suitable, but the structure of it is probably very complicated.
I've created 4 categories per Unit (30 units). Each category has 8 graphs/dimensions. Each graph/dimension has a scale that I've visually broken down into 4 major interval points (inerval1, Interval2, etc). I'm intending to put a figure in a box that represents the change against these 4 interval points. Therefore, 4(categories)*8(dimensions) = 32. Then 32*4(intervals) = 128. This means per unit I need to record 128 changes.
...and the best thing, there are 3 distincy scales. 4 of the graphs use one scale. 2 use another and the last 2 use a final one.
Like I said this is a monster of a task and doing this in excel is possible, but doesn't give me the flexibility I think I need when it comes to comparing the data.
30 Units (tblInventory)
4 Categories per Unit (tblCategories1, tbleCategories2, etc)
8 Dimensions/Grpahs per categroy. (Dim1, Dim2, etc)
3 Scales (tblScale1, tblScale2, etc)
I'm trying to figure out where the actual data would be captured. Would I have a single table called tblIntervalData that is related to a linking table that connects to each of the 3 tblScales, which in turn are linked to the tblDimensions?
Below is a screen grab of what I've done, but it doesn't feel right. Your views and advice will be much appreciated. '
A higher resolution image can bee seen here
I can't see your pic behind my stupid corporate firewall, but...
A). Since excel only really manages to handle two (arguably three) dimensions of data at all well it's very unlikely that not going the DB route is correct if you have any kind of relations to deal with.
B). Stop usign hungarian notation, by which I mean drop the "tbl" prefixes.
C). I'd agree that sort of sounds like you do want a table (or similar tables) "Intervals" (avoid the word data - everything is data) which will have FK relationships to Units, Scales, etc, but it's hard for me to be sure without seeing your diagram I think. Limited help I know.

Resources