I've just started my residency as a radiation oncologist. I have a little background in programming (Python, VBA).
I'd like your insights on an issue I have at work.
The issue : For each patient, the radiation oncologist needs to do a contouring. Basically, he contours the main structures (like the aorta, the heart, the lungs, and so on) on a CT scan. This is essential for computing the spatial distribution of the radiations (because you want to avoid those structures). The contouring is done within a 3rd party software (called Isogray). The CT scans come from the hospital database and the radiation distribution is computed on another software.
It takes at least one hour to do a complete contouring. Multiply that by each patients (maybe a dozen per week) and by each oncologists (we are a team of 15 members) and you can see that it represents hundred (maybe even thousand) manhours every year.
There exists softwares that do this automatically, but the hospital doesn't want to rent/buy them. But, seriously, how hard can this be to do a little automation ? Can't I do this myself ?
My plan of action : Here I'd like your insights. How can I automate this task ? The first thing is that I can't change anything within Isogray, so I need to do the automation externally. What I think I should do :
Create a database of the historical contourings : this means I need to be able to read what Isogray uses as an output files
Design an automatic model : I'm thinking deep learning models here. I don't know if there's anything more optimal to do than calibrating a deep learning model on the contoured CT scans I already have
Create a little software : based on the automatic model, the software will take a 'not contoured' Isogray file and turn it into a 'contoured' file. The oncologist only needs to load the new file into Isogray and validate the contouring
What do you think ? Do you see an easier way to do that ? I don't know anything about Isogray (I just know how to use it). Do you think this is doable? What information do I need before I start this project ?
Any insights will be welcomed :)
From what I have understood it is a problem of semantic segmentation.
You have an input image of N dimensions (or black and white) and you use the neural network to indicate which regions correspond to a specific organ.
You can use an architecture like the U-Net for this task: https://medium.com/#keremturgutlu/semantic-segmentation-u-net-part-1-d8d6f6005066
What I do not know is if the degree of reliability would be very high, that depends on many factors.
Neural networks look for differentiating patterns to discriminate zones, the first important component is shape and color. That is why it is more difficult when both the color and the shape are very different.
On the other hand you will need a lot of images but you can create a process called data-augmentation to generate more (artificial).
Another method that is currently used is to work in reverse, we know that the problem of image segmentation is difficult. But you can design a program that simulates real images where segmentation is known perfectly.
There are only some keypoints, I hope I have helped you.
EDIT:
Semantic segmentation in biomedic context: https://towardsdatascience.com/review-u-net-biomedical-image-segmentation-d02bf06ca760
You need to provide more background on the specifics on the contouring, especially given the fact that this is for medical diagnosis. Truthfully, I wouldn't try and automate this for liability reasons.
If you make an error someone it could cause a misdiagnosis, which as you already know can lead to numerous problems including lawsuits and death. The nice thing about 3rd party products is that it is already being tested robustly against numerous scenarios and approved for medical usage and liability reasons.
I'm pretty sure you could make a masters thesis doing something like this
With that being said, there is a nice github repo for problems like this that I think you could potentially start generating ideas from.
Related
My requirement is probably close to what one expects of an "Expert System". And looking for the simplest solution, that can give me real-time or near-real time inference, with some offline (non-realtime) learning capabilities.
To elaborate, my problem is --
Watch a log that is being updated live, and classify each entry as Red, Green and Blue.
The classification into Red, Green, Blue is based on logic codified as production-rules (as I imagine it today).
The point where it gets challenging is --
1) Log entries tagged Blue will eventually have to be tagged red / green, based on subsequent log entries, where we hope to have more detailed information, so there is a bit of remembering to be done. The exact duration to wait, isn't known in advance, but there's a max limit. Of course, at any given point in time, there could be several hundred-thousand entries that are tagged Blue.
2) The rules that determine Red & Green are not perfect, so sometimes mistakes happen with labeling. So an occasional manual audit reveals these mistakes. My main challenge is to see if I could automate some part of rule-updating, with minimal programming effort.
My (continuing study) reveals that RETE algorithm based rule-engine might serve my classification & labeling, including the re-labelling. If that works, I still need to figure how to automate the part of "learning from mistakes" ? Can one take a statistical approach -- s.a. Bayesian classification ? Also, could one take the Bayesian classification completely as against Rules-Engine, for the initial classification s.t. I've manually trained the system sufficiently ? Bayesian approach seems to "dumb down" the task of maintaining a correct set of rules, by "trust the statistics" approach, especially as there are these periodic manual audits.
PS> My main application is written in C++ (if that matters).
This sounds like Complex Event Processing (CEP), where you have rules and the ability to use time calculations like event X is within 2 minutes after event y.
In Java land, Drools Fusion (or Drools Expert) would handle that really well (I am biased though). In C++ land... well maybe you can set up a drools-camel-server and communicate through XML with it.
I remember when I was in college we went over some problem where there was a smart agent that was on a grid of squares and it had to clean the squares. It was awarded points for cleaning. It also was deducted points for moving. It had to refuel every now and then and at the end it got a final score based on how many squares on the grid were dirty or clean.
I'm trying to study that problem since it was very interesting when I saw it in college, however I cannot find anything on wikipedia or anywhere online. Is there a specific name for that problem that you know about? Or maybe it was just something my teacher came up with for the class.
I'm searching for AI cleaning agent and similar things, but I don't find anything. I don't know, I'm thinking maybe it has some other name.
If you know where I can find more information about this problem I would appreciate it. Thanks.
Perhaps a "stigmergy" approach is closely related to your problem. There is a starting point here, and you can find something by searching for "dead ants" and "robots" on google scholar.
Basically: instead of modelling a precise strategy you work toward a probabilistic approach. Ants (probably) collect their deads by piling up according to a simple rule such as "if there is a pile of dead ants there, I bring this corpse hither; otherwise, I'll make a new pile". You can start by simplifying your 'cleaning' situation with that, and see where you go.
Also, I think (another?) suitable approach could be modelled with a Genetic Algorithm using a carefully chosen combination of fitness functions such as:
the end number of 'clean' tiles
the number of steps made by the robot
of course if the robots 'dies' out of starvation it automatically removes itself from the gene pool, a-la darwin awards :)
You could start by modelling a very, very simple genotype that will be 'computed' into a behaviour. Consider using a simple GA such as this one by Inman Harvey, then to each gene assign either a part of the strategy, or a complete behaviour. E.g.: if gene A is turned to 1 then the robot will try to wander randomly; if gene B is also turned to 1, then it will give priority to self-charging unless there are dirty tiles at distance X. Or use floats and model probability. Your mileage may vary but I can assure it will be fun :)
The problem is reminiscent of Shakey, although there's cleaning involved (which is like the Roomba -- a device that can also be programmed to perform these very tasks).
If the "problem space" (or room) is small enough, you can solve for an optimal solution using a simple A*-based search, but likely it won't be, since that won't leave for very interesting problems.
The machine learning approach suggested here using genetic algorithms is an interesting approach. Given the problem domain you would only have one "rule" (a move-to action, since clean could be eliminated by implicitly cleaning any square you move to that is dirty) so your learner would essentially be learning how to move around an environment. The problem there would be to build a learner that would be adaptable to any given floor plan, instead of just becoming proficient at cleaning a very specific space.
Whatever approach you have, I'd also consider doing a further meta-reasoning step if the problem sets are big enough, and use a partition approach to divide the floor up into separate areas and then conquering them one at a time.
Can you use techniques to create data to use "offline"? In that case, I'd even consider creating a "database" of optimal routes to take to clean certain floor spaces (1x1 up to, say, 5x5) that include all possible start and end squares. This is similar to "endgame databases" that game AIs use to effectively "solve" games once they reach a certain depth (c.f. Chinook).
This problem reminds me of this. A similar problem is briefly mentioned in the book Complexity as an example of a genetic algorithm. These versions are simplified though, they don't take into account fuel consumption.
I am trying to do the following:
we are trying to design a fraud detection system for stock market.
I know the Specification for the frauds (they are like templates).
so I want to know if I can design a template, and find all records that match this template.
Notice:
I can't use the traditional queries cause the templates are complex
for example one of my Fraud is circular trading,it's like this :
A bought from B, and B bought from C, And C bought from A (it's a cycle)
and this cycle can include 4 or 5 persons.
is there any good suggestion for this situation.
I don't see why you can't use "traditional queries" as you've stated. SQL can be used to write extraordinarily complex queries. For that matter I'm not sure that this is a hugely challenging question.
Firstly, I'd look at the behavior you have described as vary transactional, therefore I treat the transactions as a model. I'd likely have a transactions table with some columns like buyer, seller, amount, etc...
You could alternatively have the shares as its own table and store say the previous 100 owners of that share in the same table using STI (Single Table Inheritance) buy putting all the primary keys of the owners into an "owners" column in your shares table like 234/823/12334/1234/... that way you can do complex queries and see if that share was owned by the same person or look for patterns in the string really easily and quickly.
-update-
I wouldn't suggest making up a "small language" I don't see why you'd want to do something like that when you have huge selection of wonderful languages and databases to choose from, all of which have well refined and tested methods to solve exactly what you are doing.
My best advice is pop open your IDE (thumbs up for TextMate) and pick your favorite language (Ruby in my case). Find some sample data and create your database and start writing some code! You can't go wrong trying to experiment like this, it'll will totally expose better ways to go about it than we can dream up here on Stackoverflow.
Definitely Data Mining. But as you point out, you've already got the models (your templates). Look up fraud DETECTION rather than prevention for better search results?
I know a some banks use SPSS PASW Modeler for fraud detection. This is very intuitive and you can see what you are doing as you play around with the data. So you can implement your templates. I agree with Joseph, you need to get playing, making some new data structures.
Maybe a timeseries model?
Theoretically you could develop a "Small Language" first, something with a simple syntax (that makes expressing the domain - in your case fraud patterns - easy) and from it generate one or more SQL queries.
As most solutions, this could be thought of as a slider: at one extreme there is the "full Fraud Detection Language" at the other, you could just build stored procedures for the most common cases, and write new stored procedures which use the more "basic" blocks you wrote before to implement the various patterns.
What you are trying to do falls under the Data Mining umbrella, so you could also try to learn more about it: maybe you can find a Data Mining package for your specific DB (you didn't specify) and see if it helps you finding common patterns in your data.
I don't see an answer to this question here on SO which makes me afraid that it's incredibly simple and I'm just missing something but here goes.
Background, feel free to skip: I need a single course for my bachelor's degree that I skipped out on years ago. Theoretically it's Computer Graphics, but since I left it has become more Game Development. And that's great because to me it's more interesting than the fill algorithms and translations and whatnot that it used to be. It's a 4th year course only offered every other year, but I've managed to talk the department into letting me take a 4th year independent study on the same topic and call that good enough.
The prof "running" the independent study doesn't teach the actual Computer Graphics course so while he's a smart guy this isn't really his field. So most of my questions are left to me, a text book and the internet. You know...like an independent study should be. :)
/Background
I've got a buddy that likes to develop game systems for fun. I plan to take one of his table top games and make it into a computer game using XNA.
I don't foresee any insurmountable challenges with the game mechanics but one thing I'm curious about is how do most games save their content? I mean that in a couple of ways and hopefully I can express them clearly.
Take the case of any RPG you've ever played. You can hit the "Save" button and save the world, your character's information and whatever other information is necessary. Then later on you can hit the "Load" button and bring it back.
Or the case of NPC dialogue. When I bump into Merchant #853 he randomly spits out one of 3 different greetings.
There are others that I can think of but they're really just variations on the same theme. Even with those two examples it seems to me the same mechanic could be used, but what is that mechanic?
I've been doing web development for years so my mind automatically jumps to "databases!". A database is the solution to any problem. And I can see how it could work here but the overhead seems pretty steep. "Here's my 6mb compiled game...oh and 68mb MySQL installation." Or even worse since I'm using XNA, maybe I'd need to find a way to bundle SQL Server. :)
I thought maybe XML but that doesn't feel right to me either. How would it work if I wanted to run on the XBox? Or Zune? (Those aren't necessary for what I'm doing, but there must be a solution somewhere that takes them into account.)
Anyone know the secret? Or have some ideas anyway?
Thanks
Jeff
There are two main ways how games are saved, a simple one and a complex one. The first way is to simply stores the current level, the current score and a handful of other stats. This is seen in games such as Super Mario Galaxy and most earlier console module based games. The save game doesn't restore your exact position, but just which levels you have completed. These save games are generally very simple and require very little memory.
The second way not only stores your overall progress, but stores each and every little detail, such as enemy positions, their current animation frame and so on, so that loading a save game will place you at the exact spot where you stopped, with all the enemies right in place, instead of back at the start of a level. These savegames tend to get much bigger than the other version and thus are mostly seen on PC games.
Databases are used in neither of these schemes, as the purpose of databases is to provide the ability to dynamically query data structures, what the game however needs isn't a way to query individual pieces, but just a way to statically store them. When a savegame is loaded, it is loaded completly into memory and from there on the game engine does its thing with the data. There are a handful of exceptions, such as MMORPGs which might work on a database, but single player games generally don't.
How the data is actually stored depends on the game. Most common seem to be simple binary data formats, as they are much better in terms of disk space than XML. In older games those binary formats where often raw dumps of a pieces of memory of the games process, so they didn't have any well thought out structure and often broke when a patch or a different version of a game got released, in some modern games that's still the case. XML can be used as too, as well as any other text based file format.
In large part this is more a game design issue than a programming one, as they way a game can be saved can drastically change how its played. The simple way, where you just save the level number and some stats, is however a lot easier to implement, as its just a few lines of of code. While the second one requires serialization of most of your classes, which for a complex game can be quite a tricky issue and lead to many subtle bugs.
One approach is to use .net serialization.
Make sure the state of you game is a fully connected graph and that each class in that graph is marked as Serializable (with the SerializableAttribute), the for saving (and loading) you can use normal .net serialization.
You can look at the codebase for Project Xenocide (open source XNA game) to see how it was done there.
You could use an SQLite database, with the SQLite.NET wrapper. I've used this, and found it quite simple. The whole DLL is only 850KB, and the database itself sits in a single file (with temp files created as needed). So your users shouldn't have an issue.
But you could also use a simple XML file, or a home-grown binary format. It all depends on how you're going to be querying the data, and how much data is involved. There is no one answer.
As others have noted, serialization is the way to go. And Gamasutra just published an article on data baking.
From my limited experience developing games, save games really don't use much storage. As tvanfosson said, you normally store most things in memory while playing the game, so saving state to disk isn't a problem.
Here's a short example. Assuming a single person RPG, if you needed to save your character's location only, you'd have perhaps a level number, xyz coordinates and maybe the direction you're facing. That's just a few bytes.
Now assume you need to save the state/location of things like health packs, crates, enemies, character's health and picked up items, etc. You could have a few hundred of these at most which would easily be less than 10KB.
Obviously things can get very complicated with more complex games. The trick is to only store what is truly necessary to recreate the player's experience. A lot of games only let you save at certain places, like the end of a level. In this case you only need to store the new level number plus the outcome of previous levels (e.g. health remaining, picked up items).
Even if you allow arbitrary save points you can ignore the state of any places/levels that you cannot return to. And you probably wouldn't want the user to be able to save mid-jump.
EDIT: With regard to file format... use any way that's convenient for the data type! XML is quite a nice way of doing things. Not sure how effective a database would be since for an RPG each fragment of data can be very different; You might end up with a bunch of tables with one row each.
Most games use their own, binary, file formats. Firstly this reduces the storage amount dramatically. Secondly, it helps prevent users cheating by editing the save game manually - if you have XML like <health value="10"/> it's very easy to edit the file to read <health value="100"/>. The downside of binary is that it's much more difficult for debugging.
While the game is running, I'd try to keep everything relating to the current context in memory. Your initialization can be kept in some suitable serialized format and read in on start up. XML would work, but it's somewhat verbose. A custom compact binary format is probably more appropriate. The same is true of the saved state. Whatever objects need to be reinitialized when the saved game is loaded should be serialized to a custom binary format and then reconstituted on load. If you run into memory problems, a small custom database optimized for speed would be another alternative. It could be pre-populated on installation.
our webapp collects huge amount of data about user actions, network business, database load, etc etc etc
All data is stored in warehouses and we have quite a lot of interesting views on this data.
if something odd happens chances are, it shows up somewhere in the data.
However, to manually detect if something out of the ordinary is going on, one has to continually look through this data, and look for oddities.
My question: what is the best way to detect changes in dynamic data which can be seen as 'out of the ordinary'.
Are bayesan filters (I've seen these mentioned when reading about spam detection) the way to go?
Any pointers would be great!
EDIT:
To clarify the data for example shows a daily curve of database load.
This curve typically looks similar to the curve from yesterday
In time this curve might change slowly.
It would be nice that if the curve from day to day changes say within some perimeters, a warning could go off.
R
Take a look at Control Charts, they provide a way to track changes in your data visually and specify when the data is "out of control" or "anomalous". They are heavily used in manufacturing to ensure quality control.
This question is impossible to answer without knowing much more about the particular data you have. For an overview of what kinds of approaches exist, see Anomaly Detection: A Survey by Chandola, Banerjee, and Kumar.
Bayesian classification might help you find some anomalies in your data, depending on the type of data and how good you train your Bayesian filter.
There is even one available as a web service # uClassify.com.
This depends so much on what the data is. Take a statistics class and learn the basics first. This isn't usually an easy or simple problem.