I've just started my residency as a radiation oncologist. I have a little background in programming (Python, VBA).
I'd like your insights on an issue I have at work.
The issue : For each patient, the radiation oncologist needs to do a contouring. Basically, he contours the main structures (like the aorta, the heart, the lungs, and so on) on a CT scan. This is essential for computing the spatial distribution of the radiations (because you want to avoid those structures). The contouring is done within a 3rd party software (called Isogray). The CT scans come from the hospital database and the radiation distribution is computed on another software.
It takes at least one hour to do a complete contouring. Multiply that by each patients (maybe a dozen per week) and by each oncologists (we are a team of 15 members) and you can see that it represents hundred (maybe even thousand) manhours every year.
There exists softwares that do this automatically, but the hospital doesn't want to rent/buy them. But, seriously, how hard can this be to do a little automation ? Can't I do this myself ?
My plan of action : Here I'd like your insights. How can I automate this task ? The first thing is that I can't change anything within Isogray, so I need to do the automation externally. What I think I should do :
Create a database of the historical contourings : this means I need to be able to read what Isogray uses as an output files
Design an automatic model : I'm thinking deep learning models here. I don't know if there's anything more optimal to do than calibrating a deep learning model on the contoured CT scans I already have
Create a little software : based on the automatic model, the software will take a 'not contoured' Isogray file and turn it into a 'contoured' file. The oncologist only needs to load the new file into Isogray and validate the contouring
What do you think ? Do you see an easier way to do that ? I don't know anything about Isogray (I just know how to use it). Do you think this is doable? What information do I need before I start this project ?
Any insights will be welcomed :)
From what I have understood it is a problem of semantic segmentation.
You have an input image of N dimensions (or black and white) and you use the neural network to indicate which regions correspond to a specific organ.
You can use an architecture like the U-Net for this task: https://medium.com/#keremturgutlu/semantic-segmentation-u-net-part-1-d8d6f6005066
What I do not know is if the degree of reliability would be very high, that depends on many factors.
Neural networks look for differentiating patterns to discriminate zones, the first important component is shape and color. That is why it is more difficult when both the color and the shape are very different.
On the other hand you will need a lot of images but you can create a process called data-augmentation to generate more (artificial).
Another method that is currently used is to work in reverse, we know that the problem of image segmentation is difficult. But you can design a program that simulates real images where segmentation is known perfectly.
There are only some keypoints, I hope I have helped you.
EDIT:
Semantic segmentation in biomedic context: https://towardsdatascience.com/review-u-net-biomedical-image-segmentation-d02bf06ca760
You need to provide more background on the specifics on the contouring, especially given the fact that this is for medical diagnosis. Truthfully, I wouldn't try and automate this for liability reasons.
If you make an error someone it could cause a misdiagnosis, which as you already know can lead to numerous problems including lawsuits and death. The nice thing about 3rd party products is that it is already being tested robustly against numerous scenarios and approved for medical usage and liability reasons.
I'm pretty sure you could make a masters thesis doing something like this
With that being said, there is a nice github repo for problems like this that I think you could potentially start generating ideas from.
Firstly I apologise that this is a bit broad. If there is a better Stack Exchange board, please point me to it :)
I am looking for a way to iterate through a formula hundreds of times (or just constantly running) where a user can adjust a dataset and see the results (again, likely in real time?)
The idea being that I want to look at creating a formula for two players fighting in an RPG game. They will each have some stats, and I want to see what % win rate each has, based on a formula of my own devising.
I know PHP and Javascript, some Java and C# if that helps :)
I have started to tinker with a basic JS script, but I'd like to present a bar showing the % win/loss rates and again, have the interaction element if possible (rather then have to set the values, run it, read the results, rinse and repeat.
I have to develop a personality/job suitability online test for an HR department. Basically, users will answer questions, on a scale of 0-10 for example, and after say 50 questions, I want to translate that to a rating in 5 different personality/ job suitability characteristics.
I don't have any real data to start with, so first, is it even worth it to use a recommendation engine like MyMediaLite (github). How many samples will I need to train it to a decent performance?
I previously built a training course recommender, by simply doing and hand-weighted sum where each question increased the weight of several courses that were related to that question. It was an expert system, built like a feed-forward neural network, where I personally tuned all the weights based on my knowledge of the questions and the courses' content.
I would like to this time around use a recommender system, but I'm wondering how many times I would have to take the 50 question test, and then assign the results manually. would 100 examples do? that could be possible. 1000 would be too long. How can I know ahead of time?
Though useless, I want to say this is not possible to give a definite number. You should focus on learning curve when adding new samples.
You can process the samples by hand and engine on parallel, and compare the result given by both. Once the measurement e.g. recall and precision of the result given by engine achieve your expectation, then you get enough samples.
Hope this helpful!
I remember when I was in college we went over some problem where there was a smart agent that was on a grid of squares and it had to clean the squares. It was awarded points for cleaning. It also was deducted points for moving. It had to refuel every now and then and at the end it got a final score based on how many squares on the grid were dirty or clean.
I'm trying to study that problem since it was very interesting when I saw it in college, however I cannot find anything on wikipedia or anywhere online. Is there a specific name for that problem that you know about? Or maybe it was just something my teacher came up with for the class.
I'm searching for AI cleaning agent and similar things, but I don't find anything. I don't know, I'm thinking maybe it has some other name.
If you know where I can find more information about this problem I would appreciate it. Thanks.
Perhaps a "stigmergy" approach is closely related to your problem. There is a starting point here, and you can find something by searching for "dead ants" and "robots" on google scholar.
Basically: instead of modelling a precise strategy you work toward a probabilistic approach. Ants (probably) collect their deads by piling up according to a simple rule such as "if there is a pile of dead ants there, I bring this corpse hither; otherwise, I'll make a new pile". You can start by simplifying your 'cleaning' situation with that, and see where you go.
Also, I think (another?) suitable approach could be modelled with a Genetic Algorithm using a carefully chosen combination of fitness functions such as:
the end number of 'clean' tiles
the number of steps made by the robot
of course if the robots 'dies' out of starvation it automatically removes itself from the gene pool, a-la darwin awards :)
You could start by modelling a very, very simple genotype that will be 'computed' into a behaviour. Consider using a simple GA such as this one by Inman Harvey, then to each gene assign either a part of the strategy, or a complete behaviour. E.g.: if gene A is turned to 1 then the robot will try to wander randomly; if gene B is also turned to 1, then it will give priority to self-charging unless there are dirty tiles at distance X. Or use floats and model probability. Your mileage may vary but I can assure it will be fun :)
The problem is reminiscent of Shakey, although there's cleaning involved (which is like the Roomba -- a device that can also be programmed to perform these very tasks).
If the "problem space" (or room) is small enough, you can solve for an optimal solution using a simple A*-based search, but likely it won't be, since that won't leave for very interesting problems.
The machine learning approach suggested here using genetic algorithms is an interesting approach. Given the problem domain you would only have one "rule" (a move-to action, since clean could be eliminated by implicitly cleaning any square you move to that is dirty) so your learner would essentially be learning how to move around an environment. The problem there would be to build a learner that would be adaptable to any given floor plan, instead of just becoming proficient at cleaning a very specific space.
Whatever approach you have, I'd also consider doing a further meta-reasoning step if the problem sets are big enough, and use a partition approach to divide the floor up into separate areas and then conquering them one at a time.
Can you use techniques to create data to use "offline"? In that case, I'd even consider creating a "database" of optimal routes to take to clean certain floor spaces (1x1 up to, say, 5x5) that include all possible start and end squares. This is similar to "endgame databases" that game AIs use to effectively "solve" games once they reach a certain depth (c.f. Chinook).
This problem reminds me of this. A similar problem is briefly mentioned in the book Complexity as an example of a genetic algorithm. These versions are simplified though, they don't take into account fuel consumption.
It recently emerged on a large poker site that some players were possibly able to see all opponents cards as they played through exploiting a security vulnerability that was discovered.
A naïve cheater would win at an incredibly fast rate, and these cheats are caught very quickly usually, and if not caught quickly they are easy to detect through a quick scan through their hand histories.
The more difficult problem occurs when the cheater exhibits intelligence, bluffing in spots they are bound to be called in, calling river bets with the worst hands, the basic premise is that they lose pots on purpose to disguise their ability to see other players cards, and they win at a reasonably realistic rate.
Given:
A data set of millions of verified and complete information hand histories
Theoretical unlimited computer power
Assume the game No Limit Hold'em, although suggestions on Omaha or limit poker may be beneficial
How could we reasonably accurately classify these cheaters? The original 2+2 thread appeals for ideas, and I thought that the SO community might have some useful suggestions.
It's an interesting problem also because it is current, and has real application in bettering the world if someone finds a creative solution, as there is a good chance genuine players will have funds refunded to them when identified cheaters are discovered.
Plot V$PIP versus winrate of all players with a statistically significant #hands played. You should see outliers with naked eye. I think that's the basic thing to do first.
Then you can plot WTSD vs winrate, winrate at showdown vs winrate without showdown, %won at showdown vs VPIP.
The stats you choose must be significant statistically. If you know poker, the above choices make sense.
This is not a job for a machine, outliers are detected by eye.
EDIT: Omaha is much tougher, since it is really variant. There are cases of unbelievable streaks made by weak players who were not cheating.
I hate to be so blunt, but all the answers on this page with the exception of #Erwin Smout's are worthless.
Statistical analysis is a joke for identifying poker cheats
I realize the question allows there to be millions of hands worth of history available to the system. I'm sure there are players with hand histories this large, hell, I've probably played this many online hands. But I've also been playing online for over 10 years. Thats not a small amount of time, and it is my understanding that two conflicting things are true when it comes to identifying online poker cheaters: it needs to happen in a small amount of time, and like any good thief, an online poker cheat is going to take his stash elsewhere immediately after the taking.
There was a great example of the variance in poker in this paper which was generated by matching an always raise player versus an always call player (page 13 of the PDF). Over the course of 100,000 hands, wayyyy more than I think most people would be willing to play against someone who could see their cards, the always call player won on average .026 small blinds per hand. I know this does not sound like much, but assuming stakes of $5-10, that comes out to $6,500. Maybe someone can help me find the link, but the measured professional win rate is less not too much larger than this. Please note, NEITHER of these players was cheating, and the statistically expected difference over this number of hands is significantly less than what actually transpired.
What online poker players need to understand
Poker is gambling. It is a game of skill, because some players are able to elicit more information from their opponents than their opponents are able to gather, and that extra information is often as useful as seeing other peoples cards. Even players who are better players than their typical opponents, will end up long term losers. If you do not understand this, you're just searching for witches with statistics in the arbitrarily small number of hands you'll be playing against any opponent.
What can be done?
Keeping in mind the question states that cheaters are able to see the other players cards, you don't need statistical analysis to identify them. There are only three ways in which that is possible.
First is that the server is sending the information intentionally to clients which is an obvious security issue and should not be implemented (IMO, even for moderators). If a site was found allowing this to happen, it is the player's responsibility to move their funds elsewhere, or refuse to play on the site until that terrible design decision is rectified. It should also be the responsibility of the sites to inform their players of the exact steps that take place during hands played on the site so they have that to make their decision on when choosing a site in the first place. Security by obscurity is unpermitable. As for catching the thieves, this information should be sitting in log files on their servers, which should be regularly audited for this type of behavior.
Second is that the user has hacked the poker server and they would know about that in hurry, or else once it is exposed, it is again players responsibility to determine where to play. In this case, the cheater can be prosecuted in most countries.
Lastly, it is possible the dealing algorithm has been cracked. This one was a major problem in the past with companies that used naive methods to deal hands, but most of the major shops solved this problem by taking random inputs from players logged into their system as well as using entropy generating hardware to seed their random number generator. Thats not to say it cannot be cracked however. If this is the case, the only option is for the company to engineer a new random number generator.
Well. IT people get fascinated by all kinds of wrong question.
A better question is "how is cheating even possible ?". There is no need what so ever to send the opponent's hands over the wire until at showdown. If that data isn't sent to the client, then how could they cheat ?
They'd need to break into the server. Don't tell me that isn't preventable.
I think if they cheat intelligent, so with winning not too much rounds, it won't be detectable. I don't believe you could see the difference between luck and cheating here.
But I would like to know at which online poker provider the cheating is possible. Because I can't imagine a way how to do this, if the poker software is coded properly. If I was asked to program an online poker software, The users wouldn't be able to see the opponents cards, because there is no way he could get this information. And this is how I would do this.
Every connection between users and server is encrypted
no communication between users, the users can only talk to the server.
The server tells every user only the cards the user should see, and no other cards, unless the round is finished and the users open their cards.
The only way the users could cheat here is, you get together with other players, or impersonate multiple players with different accounts and accessing IPs, and open another channel to communicate between the players. This way the group has a big advantage because they know more than their own cards, but there's still no way they can see other cards. And because it's now a group that is cheating it is even more harder to detect it, because they can share their earnings with multiple players, and this group could even have a player that looses more than (s)he gains and still win overall.
I doubt you can say with any certainty if someone is cheating or if they are just good at Poker, past a certain point.
You could however narrow the candidates who you think might be cheating, by looking at the users who over your time period benefited overall. This will remove the vast majority of users, allowing you to focus your resources better. (This of course will include users who are skilled at Poker.).
Once you've done that, you can compare the history of play from while the cheat was possible to the history afterwards or before, and see if the users success decreases or increases.
That should give you a list of users who you need to investigate more carefully, possibly by analyzing specific games.
Enjoy, it's a nice problem.
For all of you expressing disbelief that this is even possible: the community on the poker forums linked in OP were similarly awestruck, but the site in question has confirmed that such a security vulnerability was present. Quite simply, the site was using very basic and insecure crypto to transmit hole card data to its players. Theoretically, it would have been possible for anyone aware of this to intercept transmissions from the site to a specific victim (eg. by being physically nearby and intercepting wireless data), and to cheat that player using the intercepted knowledge.
The question is about how to detect whether this vulnerability was actually exploited (before it was fixed), and if so by whom, given the resources outlined.
Oh, and also some of you seem to be assuming we're talking about a hypothetical scenario, and/or play-money poker; we're not. The site is real, the vulnerability was real, the investigation is really happening (see link in OP), and the games under investigation are real-money games with normal buyins of $200 and above.
I'm by no means a data-mining expert, and my grasp of statistical analysis of large data sets is pretty weak as well (and I'm not very good at poker, even though I love it) so take everything I say here with a grain of salt.
Weed out the junk data. You are going to only really care about players that fit into two categories: (1) players who win more hands than they lose, (2) players who win more money than they lose. Who cares about a cheater who loses a lot? Heh.
With this paired down list of players to actually analyze, I would take a look at their style of play. Assuming you have a lot of historical data, I would build a player skill profile and attempt to normalize their betting strategy. As a poor poker player, I normally will back up weaker cards that no decent player would back simply because they feel good. For example, any time I am dealt a face card with another low card (2, 3, 4, 5), if they're suited, I'll often ALWAYS call any bets made by other players before the turn, even though this strategy is not very successful. Pre-turn raises above the Big Blind often indicate a player has a pocket pair, yet my love of playing won't let me fold a suited hand pre-flop.
So for me, your analysis of my play would say that me matching aggressive calls pre-flop when I have anything suited would be normal. But a different player who only occasionally calls large pre-flop bets would be an indication that something might be out of whack.
I don't know what sort of system you'd need to build to make a profile of different users styles of play, but I imagine you could use some computer learning algorithms to "learn" a person's style of play with pretty decent accuracy.
You mentioned that a smart user would throw hands to minimize his appearance as a cheater. I think this is a GREAT opportunity for more profiling. Would an experienced, winning player play through an awful hand? Probably not, ever. If I was dealt a 4S, 7H, and saw 9D, JC, AH on the flop, I would know that my chances of winning were really, really small. It also tells us that the cards given on the flop aren't very strong for anyone, so anyone at the table betting probably has a Jack or Ace paired, two pair, or three of a kind. Since you know your 4S, 7H is worthless, you'd either bet hard to bluff the pot or fold outright. Not very many good players (who would have been found in your winning players shortened list) would ever stick around on a hand like that.
Anyway, those are the things I've thought of. Now actually implementing them, I have no idea where to even begin so I'm afraid I can't be of much help there. This is a very interesting academic problem though, so please do us a favor and keep us informed of what you end up going with. If you want to take this conversation offline, feel free to email me at stackoverflow#ericharrison.info.
Could you not look for simple indicators initially before trying to do anything too complex??
i.e.. PreFlop : A player folds pocket kings with no raise before him and someone else had pocket Aces..
This MIGHT be indicative of the player knowing his starting KINGS (pretty good) is not as good as someone elses pocket ACES .. however that's assuming he makes the decision pre-flop and not post flop.. depends really..
Ignore this, just thinking out loud..
To be perfectly honest, I'd doubt very much that the players who could see opponents hands were random. There must be some sort of cross over in the code that generates the card view that was selecting some users but not others. I would recommend running tests on this code and trying to find a trend in the "viewers" and "non-viewers". If you find a strong trend, then the trend could be applied to the actual dataset too see which users, or which hands or which whatever was generating the code fault.
The answer to your question is simple. There is no way to detect that type of cheater with just hand histories. You need the information that is not public in order to correlate multiple characteristic's to find a suspected cheater.
Ohh yea, and obviously the companies that provide these games do everything possible to setup shop in a low tax, non-regulated country. Until they are regulated and enforce strict code compliance and testing this will continue to happen.
the most likely cheating situation would seem to be people working together. Three guys at same table knowing each others cards should be able to make some betting adjustments that would allow the pool of betters to come out ahead.
What stops are in place to prevent collusion?