recently i've been given an assignment to work on a new project in c++...and i've been doing some thinking for some days and nights on how to approach it....unfortunately the thinking hasn't worked out and currently struggling on the designing phase. Basically the premise is:
"The management of Ruddles, a well known local department store has decided to implement certain changes to the customer tills throughout the store. Because of problems with staff in providing customers with correct change for their purchases, in future the change will be calculated automatically from the price of the goods and the amount tendered by the customer. The coins will be dispensed automatically by the tills, which are about to be replaced with machines capable of mechanical dispensing."
this is just some information about the business itself. the following is the requirement specification given to me:
"As at present, the member of staff (the cashier) will enter the purchases, and the system will calculate the total cost. This part of the system currently operates satisfactorily, and no changes (!) are envisaged. The total cost will be passed to the new machine, and the amount tendered by the customer will also be entered.
The system will then calculate the amount of change due, and will provide the hardware interface with a list of coin denominations and the number of each to be dispensed.
This constitutes the first part of the exercise. However, management has just realised that the coin dispensing machines will need to be replenished with coins from time to time, and would like the system to be able to keep track of the numbers of coins of different denominations, and issue warnings when the number of coins of any given denomination falls below a certain value. You should therefore include facilities to provide such provision in your solution.
At the present time, the hardware is unable to dispense notes, but future developments may make it possible, and your solution should be such as to cope with this should theses improvements occur. You should therefore indicate what changes would be necessary to the software in the event that such hardware upgrading takes place."
At the moment im really unsure on how to approach the following requirements for the project so im just wondering if any of you who are far more knowledgeable and experienced than me can lend some advice or suggestions, this will be much appreciated for your time and attention:-)
Im aware that there may be multiple ways this can be approached but so far i understand that arrays need to be used and a persistence structure as well although im not entirely sure..
fortunately it doesn't have to be overly completed so long as it does the job. again i appreciate any advice or tips you can give me. thank you.
The described "software" has inputs: the total purchase, and the total tendered by the customer, and outputs: a list of coin denominations and the number of each coin. Sounds like the definition of a function to me. Figure out a good data type for each parameter, and write out the signature of your function.
If I were you, start simple. Pretend you've only got one coin type, and come up with something that will work for that. Then, extend the design to two coins of different values. After that you'll be getting a far better feel for what is needed, and you should be able to see how to extend your solution to cope with coins that have a number of different denominations.
And also forget about the persistence bit until you've got some of the other basics working.
If you try and design the whole thing in your head without actually implementing any code, your brain might explode. So, break the problem down into chunks you can manage.
It sounds like a fun problem, good luck!
Related
I have to develop a personality/job suitability online test for an HR department. Basically, users will answer questions, on a scale of 0-10 for example, and after say 50 questions, I want to translate that to a rating in 5 different personality/ job suitability characteristics.
I don't have any real data to start with, so first, is it even worth it to use a recommendation engine like MyMediaLite (github). How many samples will I need to train it to a decent performance?
I previously built a training course recommender, by simply doing and hand-weighted sum where each question increased the weight of several courses that were related to that question. It was an expert system, built like a feed-forward neural network, where I personally tuned all the weights based on my knowledge of the questions and the courses' content.
I would like to this time around use a recommender system, but I'm wondering how many times I would have to take the 50 question test, and then assign the results manually. would 100 examples do? that could be possible. 1000 would be too long. How can I know ahead of time?
Though useless, I want to say this is not possible to give a definite number. You should focus on learning curve when adding new samples.
You can process the samples by hand and engine on parallel, and compare the result given by both. Once the measurement e.g. recall and precision of the result given by engine achieve your expectation, then you get enough samples.
Hope this helpful!
My project looks like this: my data set is a bunch of profiles of people, with various attributes, e.g. boolean hasJob and int healthScore, and their income. Using this data, I'm trying to predict their income for the future. Each profile also has a history: e.g., what their attributes and income were in the past.
So in essence I'm trying to map multiple sets of (x booleans, y numbers) to a number (salary in the coming year).
I've considered neural networks, Bayes nets, and genetic algorithms for function-fitting. Any suggestions or input?
Thanks in advance!
--Emily
What you want to do is called "time series modeling". However you probably have only very little data per series (per person). I think it is difficult to find one model that fits every person as you make some general assumptions that e.g. everyone is equally career oriented. Also this is such a noisy target, it could be that e.g. you have to take into account if someone is a sweettalker or not. How do you measure such a thing? I'm pretty sure your current attributes have enough noise that will make it difficult to predict anything. When you say health status, do you mean physical health only or mental health. In different businesses different things are important. What about the business or industry they are working in? Its health and growth potential? I would assume this highly influences their income. I also think that you have dependent variables as well as attributes could (and likely are) influenced by your target variable. E. g. people with higher income have better health. It sounds like a very very complex and difficult thing and definitely nothing where "I naively grouped my data and tried a bunch of methods" is going to give meaningful results. I would suggest to learn more about time series modeling and especially also about the data that you have. Maybe try starting out with clustering persons by their initial attributes and see how they develop. Are there any variables that correlate with this development?
What is your research question?
I remember when I was in college we went over some problem where there was a smart agent that was on a grid of squares and it had to clean the squares. It was awarded points for cleaning. It also was deducted points for moving. It had to refuel every now and then and at the end it got a final score based on how many squares on the grid were dirty or clean.
I'm trying to study that problem since it was very interesting when I saw it in college, however I cannot find anything on wikipedia or anywhere online. Is there a specific name for that problem that you know about? Or maybe it was just something my teacher came up with for the class.
I'm searching for AI cleaning agent and similar things, but I don't find anything. I don't know, I'm thinking maybe it has some other name.
If you know where I can find more information about this problem I would appreciate it. Thanks.
Perhaps a "stigmergy" approach is closely related to your problem. There is a starting point here, and you can find something by searching for "dead ants" and "robots" on google scholar.
Basically: instead of modelling a precise strategy you work toward a probabilistic approach. Ants (probably) collect their deads by piling up according to a simple rule such as "if there is a pile of dead ants there, I bring this corpse hither; otherwise, I'll make a new pile". You can start by simplifying your 'cleaning' situation with that, and see where you go.
Also, I think (another?) suitable approach could be modelled with a Genetic Algorithm using a carefully chosen combination of fitness functions such as:
the end number of 'clean' tiles
the number of steps made by the robot
of course if the robots 'dies' out of starvation it automatically removes itself from the gene pool, a-la darwin awards :)
You could start by modelling a very, very simple genotype that will be 'computed' into a behaviour. Consider using a simple GA such as this one by Inman Harvey, then to each gene assign either a part of the strategy, or a complete behaviour. E.g.: if gene A is turned to 1 then the robot will try to wander randomly; if gene B is also turned to 1, then it will give priority to self-charging unless there are dirty tiles at distance X. Or use floats and model probability. Your mileage may vary but I can assure it will be fun :)
The problem is reminiscent of Shakey, although there's cleaning involved (which is like the Roomba -- a device that can also be programmed to perform these very tasks).
If the "problem space" (or room) is small enough, you can solve for an optimal solution using a simple A*-based search, but likely it won't be, since that won't leave for very interesting problems.
The machine learning approach suggested here using genetic algorithms is an interesting approach. Given the problem domain you would only have one "rule" (a move-to action, since clean could be eliminated by implicitly cleaning any square you move to that is dirty) so your learner would essentially be learning how to move around an environment. The problem there would be to build a learner that would be adaptable to any given floor plan, instead of just becoming proficient at cleaning a very specific space.
Whatever approach you have, I'd also consider doing a further meta-reasoning step if the problem sets are big enough, and use a partition approach to divide the floor up into separate areas and then conquering them one at a time.
Can you use techniques to create data to use "offline"? In that case, I'd even consider creating a "database" of optimal routes to take to clean certain floor spaces (1x1 up to, say, 5x5) that include all possible start and end squares. This is similar to "endgame databases" that game AIs use to effectively "solve" games once they reach a certain depth (c.f. Chinook).
This problem reminds me of this. A similar problem is briefly mentioned in the book Complexity as an example of a genetic algorithm. These versions are simplified though, they don't take into account fuel consumption.
It recently emerged on a large poker site that some players were possibly able to see all opponents cards as they played through exploiting a security vulnerability that was discovered.
A naïve cheater would win at an incredibly fast rate, and these cheats are caught very quickly usually, and if not caught quickly they are easy to detect through a quick scan through their hand histories.
The more difficult problem occurs when the cheater exhibits intelligence, bluffing in spots they are bound to be called in, calling river bets with the worst hands, the basic premise is that they lose pots on purpose to disguise their ability to see other players cards, and they win at a reasonably realistic rate.
Given:
A data set of millions of verified and complete information hand histories
Theoretical unlimited computer power
Assume the game No Limit Hold'em, although suggestions on Omaha or limit poker may be beneficial
How could we reasonably accurately classify these cheaters? The original 2+2 thread appeals for ideas, and I thought that the SO community might have some useful suggestions.
It's an interesting problem also because it is current, and has real application in bettering the world if someone finds a creative solution, as there is a good chance genuine players will have funds refunded to them when identified cheaters are discovered.
Plot V$PIP versus winrate of all players with a statistically significant #hands played. You should see outliers with naked eye. I think that's the basic thing to do first.
Then you can plot WTSD vs winrate, winrate at showdown vs winrate without showdown, %won at showdown vs VPIP.
The stats you choose must be significant statistically. If you know poker, the above choices make sense.
This is not a job for a machine, outliers are detected by eye.
EDIT: Omaha is much tougher, since it is really variant. There are cases of unbelievable streaks made by weak players who were not cheating.
I hate to be so blunt, but all the answers on this page with the exception of #Erwin Smout's are worthless.
Statistical analysis is a joke for identifying poker cheats
I realize the question allows there to be millions of hands worth of history available to the system. I'm sure there are players with hand histories this large, hell, I've probably played this many online hands. But I've also been playing online for over 10 years. Thats not a small amount of time, and it is my understanding that two conflicting things are true when it comes to identifying online poker cheaters: it needs to happen in a small amount of time, and like any good thief, an online poker cheat is going to take his stash elsewhere immediately after the taking.
There was a great example of the variance in poker in this paper which was generated by matching an always raise player versus an always call player (page 13 of the PDF). Over the course of 100,000 hands, wayyyy more than I think most people would be willing to play against someone who could see their cards, the always call player won on average .026 small blinds per hand. I know this does not sound like much, but assuming stakes of $5-10, that comes out to $6,500. Maybe someone can help me find the link, but the measured professional win rate is less not too much larger than this. Please note, NEITHER of these players was cheating, and the statistically expected difference over this number of hands is significantly less than what actually transpired.
What online poker players need to understand
Poker is gambling. It is a game of skill, because some players are able to elicit more information from their opponents than their opponents are able to gather, and that extra information is often as useful as seeing other peoples cards. Even players who are better players than their typical opponents, will end up long term losers. If you do not understand this, you're just searching for witches with statistics in the arbitrarily small number of hands you'll be playing against any opponent.
What can be done?
Keeping in mind the question states that cheaters are able to see the other players cards, you don't need statistical analysis to identify them. There are only three ways in which that is possible.
First is that the server is sending the information intentionally to clients which is an obvious security issue and should not be implemented (IMO, even for moderators). If a site was found allowing this to happen, it is the player's responsibility to move their funds elsewhere, or refuse to play on the site until that terrible design decision is rectified. It should also be the responsibility of the sites to inform their players of the exact steps that take place during hands played on the site so they have that to make their decision on when choosing a site in the first place. Security by obscurity is unpermitable. As for catching the thieves, this information should be sitting in log files on their servers, which should be regularly audited for this type of behavior.
Second is that the user has hacked the poker server and they would know about that in hurry, or else once it is exposed, it is again players responsibility to determine where to play. In this case, the cheater can be prosecuted in most countries.
Lastly, it is possible the dealing algorithm has been cracked. This one was a major problem in the past with companies that used naive methods to deal hands, but most of the major shops solved this problem by taking random inputs from players logged into their system as well as using entropy generating hardware to seed their random number generator. Thats not to say it cannot be cracked however. If this is the case, the only option is for the company to engineer a new random number generator.
Well. IT people get fascinated by all kinds of wrong question.
A better question is "how is cheating even possible ?". There is no need what so ever to send the opponent's hands over the wire until at showdown. If that data isn't sent to the client, then how could they cheat ?
They'd need to break into the server. Don't tell me that isn't preventable.
I think if they cheat intelligent, so with winning not too much rounds, it won't be detectable. I don't believe you could see the difference between luck and cheating here.
But I would like to know at which online poker provider the cheating is possible. Because I can't imagine a way how to do this, if the poker software is coded properly. If I was asked to program an online poker software, The users wouldn't be able to see the opponents cards, because there is no way he could get this information. And this is how I would do this.
Every connection between users and server is encrypted
no communication between users, the users can only talk to the server.
The server tells every user only the cards the user should see, and no other cards, unless the round is finished and the users open their cards.
The only way the users could cheat here is, you get together with other players, or impersonate multiple players with different accounts and accessing IPs, and open another channel to communicate between the players. This way the group has a big advantage because they know more than their own cards, but there's still no way they can see other cards. And because it's now a group that is cheating it is even more harder to detect it, because they can share their earnings with multiple players, and this group could even have a player that looses more than (s)he gains and still win overall.
I doubt you can say with any certainty if someone is cheating or if they are just good at Poker, past a certain point.
You could however narrow the candidates who you think might be cheating, by looking at the users who over your time period benefited overall. This will remove the vast majority of users, allowing you to focus your resources better. (This of course will include users who are skilled at Poker.).
Once you've done that, you can compare the history of play from while the cheat was possible to the history afterwards or before, and see if the users success decreases or increases.
That should give you a list of users who you need to investigate more carefully, possibly by analyzing specific games.
Enjoy, it's a nice problem.
For all of you expressing disbelief that this is even possible: the community on the poker forums linked in OP were similarly awestruck, but the site in question has confirmed that such a security vulnerability was present. Quite simply, the site was using very basic and insecure crypto to transmit hole card data to its players. Theoretically, it would have been possible for anyone aware of this to intercept transmissions from the site to a specific victim (eg. by being physically nearby and intercepting wireless data), and to cheat that player using the intercepted knowledge.
The question is about how to detect whether this vulnerability was actually exploited (before it was fixed), and if so by whom, given the resources outlined.
Oh, and also some of you seem to be assuming we're talking about a hypothetical scenario, and/or play-money poker; we're not. The site is real, the vulnerability was real, the investigation is really happening (see link in OP), and the games under investigation are real-money games with normal buyins of $200 and above.
I'm by no means a data-mining expert, and my grasp of statistical analysis of large data sets is pretty weak as well (and I'm not very good at poker, even though I love it) so take everything I say here with a grain of salt.
Weed out the junk data. You are going to only really care about players that fit into two categories: (1) players who win more hands than they lose, (2) players who win more money than they lose. Who cares about a cheater who loses a lot? Heh.
With this paired down list of players to actually analyze, I would take a look at their style of play. Assuming you have a lot of historical data, I would build a player skill profile and attempt to normalize their betting strategy. As a poor poker player, I normally will back up weaker cards that no decent player would back simply because they feel good. For example, any time I am dealt a face card with another low card (2, 3, 4, 5), if they're suited, I'll often ALWAYS call any bets made by other players before the turn, even though this strategy is not very successful. Pre-turn raises above the Big Blind often indicate a player has a pocket pair, yet my love of playing won't let me fold a suited hand pre-flop.
So for me, your analysis of my play would say that me matching aggressive calls pre-flop when I have anything suited would be normal. But a different player who only occasionally calls large pre-flop bets would be an indication that something might be out of whack.
I don't know what sort of system you'd need to build to make a profile of different users styles of play, but I imagine you could use some computer learning algorithms to "learn" a person's style of play with pretty decent accuracy.
You mentioned that a smart user would throw hands to minimize his appearance as a cheater. I think this is a GREAT opportunity for more profiling. Would an experienced, winning player play through an awful hand? Probably not, ever. If I was dealt a 4S, 7H, and saw 9D, JC, AH on the flop, I would know that my chances of winning were really, really small. It also tells us that the cards given on the flop aren't very strong for anyone, so anyone at the table betting probably has a Jack or Ace paired, two pair, or three of a kind. Since you know your 4S, 7H is worthless, you'd either bet hard to bluff the pot or fold outright. Not very many good players (who would have been found in your winning players shortened list) would ever stick around on a hand like that.
Anyway, those are the things I've thought of. Now actually implementing them, I have no idea where to even begin so I'm afraid I can't be of much help there. This is a very interesting academic problem though, so please do us a favor and keep us informed of what you end up going with. If you want to take this conversation offline, feel free to email me at stackoverflow#ericharrison.info.
Could you not look for simple indicators initially before trying to do anything too complex??
i.e.. PreFlop : A player folds pocket kings with no raise before him and someone else had pocket Aces..
This MIGHT be indicative of the player knowing his starting KINGS (pretty good) is not as good as someone elses pocket ACES .. however that's assuming he makes the decision pre-flop and not post flop.. depends really..
Ignore this, just thinking out loud..
To be perfectly honest, I'd doubt very much that the players who could see opponents hands were random. There must be some sort of cross over in the code that generates the card view that was selecting some users but not others. I would recommend running tests on this code and trying to find a trend in the "viewers" and "non-viewers". If you find a strong trend, then the trend could be applied to the actual dataset too see which users, or which hands or which whatever was generating the code fault.
The answer to your question is simple. There is no way to detect that type of cheater with just hand histories. You need the information that is not public in order to correlate multiple characteristic's to find a suspected cheater.
Ohh yea, and obviously the companies that provide these games do everything possible to setup shop in a low tax, non-regulated country. Until they are regulated and enforce strict code compliance and testing this will continue to happen.
the most likely cheating situation would seem to be people working together. Three guys at same table knowing each others cards should be able to make some betting adjustments that would allow the pool of betters to come out ahead.
What stops are in place to prevent collusion?
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
This is a question not really about "programming" (is not specific to any language or database), but more of design and architecture. It's also a question of the type "What the best way to do X". I hope does no cause to much "religious" controversy.
In the past I have developed systems that in one way or another, keep some form of inventory of items (not relevant what items). Some using languages/DB's that do not support transactions. In those cases I opted not to save item quantity on hand in a field in the item record. Instead the quantity on hand is calculated totaling inventory received - total of inventory sold. This has resulted in almost no discrepancies in inventory because of software. The tables are properly indexed and the performance is good. There is a archiving process in case the amount of record start to affect performance.
Now, few years ago I started working in this company, and I inherited a system that tracks inventory. But the quantity is saved in a field. When an entry is registered, the quantity received is added to the quantity field for the item. When an item is sold, the quantity is subtracted. This has resulted in discrepancies. In my opinion this is not the right approach, but the previous programmers here swear by it.
I would like to know if there is a consensus on what's the right way is to design such system. Also what resources are available, printed or online, to seek guidance on this.
Thanks
I have seen both approaches at my current company and would definitely lean towards the first (calculating totals based on stock transactions).
If you are only storing a total quantity in a field somewhere, you have no idea how you arrived at that number. There is no transactional history and you can end up with problems.
The last system I wrote tracks stock by storing each transaction as a record with a positive or negative quantity. I have found it works very well.
The Data Model Resource Book, Vol. 1: A Library of Universal Data Models for All Enterprises
The Data Model Resource Book, Vol. 2: A Library of Data Models for Specific Industries
The Data Model Resource Book: Universal Patterns for Data Modeling
I have Vol 1 and Vol 2 and these have been pretty helpful in the past.
It depends, inventory systems are about far more than just counting items. For example, for accounting purposes, you might need to know accounting value of inventory based on FIFO (First-in-First-out) model. That can't be calculated by simple "totaling inventory received - total of inventory sold" formula. But their model might calculate this easily, because they modify accounting value as they go. I don't want to go into details because this is not programming issue but if they swear by it, maybe you didn't understand fully all their requirements they have to accommodate.
both are valid, depending on the circumstances. The former is best when the following conditions hold:
the number of items to sum is relatively small
there are few or no exceptional cases to consider (returns, adjustments, et al)
the inventory item quantity is not needed very often
on the other hand, if you have a large number of items, several exceptional cases, and frequent access, it will be more efficient to maintain the item quantity
also note that if your system has discrepancies then it has bugs which should be tracked down and eliminated
i have done systems both ways, and both ways can work just fine - as long as you don't ignore the bugs!
It's important to consider the existing system and the cost and risk of changing it. I work with a database that stores inventory kind of like yours does, but it includes audit cycles and stores adjustments just like receipts. It seems to work well, but everyone involved is well trained, and the warehouse staff aren't exactly quick to learn new procedures.
In your case, if you're looking for a little more tracking without changing the whole db structure then I'd suggest adding a tracking table (kind of like from your 'transaction' solution) and then log changes to the inventory level. It shouldn't be too hard to update most changes to the inventory level so that they also leave a transaction record. You could also add a periodic task to backup the inventory level to the transaction table every couple hours or so so that even if you miss a transaction you can discover when the change happened or roll back to a previous state.
If you want to see how a large application does it take a look at SugarCRM, they have and inventory management module though I'm not sure how it stores the data.
I think this is actually a general best-practices question about doing a (relatively) expensive count every time you need a total vs. doing that count every time something changes, then storing the count in a field and reading that field whenever you need a total.
If I couldn't use transactions, I would go with the live count every time I needed a total. If transactions are available, it would be safe to perform the inventory update operations and the saving of the re-counted total within the same transaction, which would ensure the accuracy of the count (although I'm not sure this would work with multiple users hitting the database).
But if performance is not really a huge problem (and modern databases are good enough at counting rows that I would rarely even worry about this) I'd just stick with the live count each time.
I would opt for the first way, where
the quantity on hand is calculated
totaling inventory received - total of
inventory sold
The Right Way, IMO.
EDIT: I would also want to factor in any stock losses/damages into the system, but I'm sure you have that covered.
I've worked on systems that solve this problem before. I think the ideal solution is a precomputed column, which gets you the best of both worlds. Your total would be a field somewhere, thus no expensive lookups, but it can't get out of sync with the rest of your data (the database maintains the integrity). I don't remember which RDMSs support precomputed columns, but if you don't have transactions, that might not be available either.
You could potentially fake precomputed columns (very effectively... I see no downside) using triggers. You'd probably need transactions though. IMHO, keeping data integrity when you're doing this sort of controlled denormalization is the only legitimate use for a trigger.
Django-inventory geared more to fixed assets, but might give you some ideas.
IE: ItemTemplate (class) -> ItemsOnHand (instance)
ItemsOnHand can be linked to more ItemTemplates; Example Printer & the ink cartridges is requires. This also allows to set Reorder points for each ItemOnHand.
Each ItemsOnHand is linked to InventoryTransactions, this allows for easy auditing.
To avoid calculating actual on hand items from thousand of invetory transactions, checkpoints are used which are just a balance + a date. To calculate items on hand query to find the most recent checkpoint and start adding or substracting items to find the current balance of items. Define new checkpoints periodically.
I can see some benefit to having the two columns, but I'm not following the part about discrepancies - you seem to be implying that having the two columns (in and out) is less prone to discrepancy than a single column (current). Why is that?
Is not having one or two columns, what I meant with "totaling inventory received - total of inventory sold" is something like this:
Select sum(quantity) as inventory_received from Inventory_entry
Select sum(quantity) as inventory_sold from Sales_items
then
Qunatity_on_hand = inventory_received - inventory_sold
Please keep in mind that I oversimplified this and my initial explanation. I know there is much more to inventory that just keeping track of quantities, but in this case that's were the problem lies and what we want to fix. At this point the reason to change it is preciselly the cost of supporting the problems caused by the current design.
Also I wanted to mention that although this is not a "coding" question is related to algoritms and design which IMHO are very important topics.
Thanks everybody for your answers so far.
Nelson Marmol
We solve different problems, but our approach to some of them might be interesting to you.
We allow the system to make a "best guess", and give the users regular feedback about any of those guesses that look wrong.
To apply this to inventory, you could have 3 fields:
inventory_received
inventory_sold
estimated_on_hand
Then, you could run a process (daily?) along the lines of:
SELECT *
FROM Inventory
WHERE estimated_on_hand != inventory_received - inventory_sold
Of course, this relies on users looking at this alert, and doing something about it.
Also, you could have a function to reset inventory some how, either by updating inventory_sold/received, or perhaps adding another field "inventory_adjustment", which could be positive or negative.
... just some thoughts. Hope it's helpful.