So the idea is that a computer agent would be programmed in two layers, the conscious and unconscious.
The unconscious part is essentially a set of input and output devices, which I typically think of as sensors (keyboard, temperature, etc. to the limit of your imagination) and output methods (screen and speakers notably in the case of a home PC, but again to the limit of your imagination). Sensors can be added or removed at anytime, and this layer provides two main channels to the conscious layer, an single input and a single output. Defining what kind of information travels between these two layers is sort of difficult, but the basic idea is that the conscious part is constantly receiving signals (of various levels of abstraction) from the output of the unconscious part, and the conscious part can send whatever it wants down to the unconscious layer through the input channel.
The conscious layer initially knows little to nothing, it is just being completely blasted by inputs from the unconscious layer, and it knows how to send signals back, though it knows nothing about how any particular signal will affect the unconscious part. The conscious part has a large amount of storage space and processing power, however, it is all volatile memory.
Now for the question. I would like for the conscious part of the system to "grow" in that it has no idea what it can do, it just knows it can send signals, and so it starts out by sending signals down the pipe and seeing how that affects the sensor data it receives back. The dead end is that the computer is not initially trying to satisfy a goal. It is just sending signals around. To think of it like a baby being born, they need food, or sleep or to be moved out of the sun, etc. The sensory inputs of the baby are fed to its brain, which then decides to try making use of its outputs in order to get what it needs.
What kind of natural need can a computer have?
What have I tried?
Thinking specifically about how a baby becomes hungry, I certainly haven't read any research on cat scans perform on crying hungry children or anything, but I thought perhaps a particular signal comes from the unconscious with growing speed constantly which is only satiated when the signals sent back cause the baby to eat. The conscious brain's job would be to minimize the rate at which each type of signal comes in at. In other words, the "instinct" of the computer is to limit the rate of each signal coming in. What other "instincts" could there be? The problem with this analogy of course is, computers don't need to eat. Or at least I haven't been able to translate eating to something a computer needs.
Outside of the scope of this question
The end goal of this is to teach a computer who knows nothing except how it interacts with the world to play tic-tac-toe. So another idea I had was to supply a button you could press to manually stimulate the rate of a particular signal entering the conscious when it does something bad or manually soothe the rate of a particular signal when it does good.
Machine intelligence programs generally start at the Award level on Mazlow's Hierarchy of Needs because they don't have a way to perceive Physiological, Safety & Security, or Social needs. However...
At the physiological level the computer feeds on electricity. Plug in a UPS that tells the computer when it is running on battery and you have a potentially useful input for perceiving physiological needs.
Give it the ability to "perceive" that it has "lost time" or has gaps in its time record (due to power failure) and you might be able to introduce the need for Safety and Security.
Introduce social needs by making it need to interact. It could "feel" lonely when lots of time passes between inputs from the keyboard.
Detecting lost time, time passed since last keyboard interaction, and running on battery could be among the inputs available to the unconscious layer that can periodically be bumped to the attention of the conscious layer.
The computer scientists in Two Faces of Tomorrow approach a similar problem, training a computer sandboxed on a satellite to become aware. They give it those needs by, for example, making it aware that it will cease to function without electricity then providing appropriate stimulation and observing the response.
The Adolescence of P-1 is another interesting work along these lines.
A robot was programmed to believe that it liked herring sandwiches. This was actually the most difficult part of the whole experiment. Once the robot had been programmed to believe that it liked herring sandwiches, a herring sandwich was placed in front of it. Whereupon the robot thought to itself, "Ah! A herring sandwich! I like herring sandwiches."
It would then bend over and scoop up the herring sandwich in its herring sandwich scoop, and then straighten up again. Unfortunately for the robot, it was fashioned in such a way that the action of straightening up caused the herring sandwich to slip straight back off its herring sandwich scoop and fall on to the floor in front of the robot. Whereupon the robot thought to itself, "Ah! A herring sandwich..., etc., and repeated the same action over and over and over again. The only thing that prevented the herring sandwich from getting bored with the whole damn business and crawling off in search of other ways of passing the time was that the herring sandwich, being just a bit of dead fish between a couple of slices of bread, was marginally less alert to what was going on than was the robot.
The scientists at the Institute thus discovered the driving force behind all change, development and innovation in life, which was this: herring sandwiches. They published a paper to this effect, which was widely criticised as being extremely stupid. They checked their figures and realised that what they had actually discovered was "boredom", or rather, the practical function of boredom. In a fever of excitement they then went on to discover other emotions, Like "irritability", "depression", "reluctance", "ickiness" and so on. The next big breakthrough came when they stopped using herring sandwiches, whereupon a whole welter of new emotions became suddenly available to them for study, such as "relief", "joy", "friskiness", "appetite", "satisfaction", and most important of all, the desire for "happiness'.
This was the biggest breakthrough of all.
~from The Hitchhiker's Guide to the Galaxy by Douglas Adams
Bonus
Have a look at Reinforcement Learning.
Related
Hi I am training reinforcement learning agents for a control problem using PPO algorithm. I am tracking the accumulated rewards for each episode during the training process. Several times during the training process I see a sudden dip in the accumulated rewards. I am not able to figure out why this is happening or how to avoid this. Tried with changing some of the hyper parameters like changing the number of neurons in the neural network layers, learning rate etc.. but still I see this happening consistently.
If I debug and check the actions that are being taken during dips, obviously actions are very bad hence causing a decrease in rewards.
Can some one help me with understanding why this is happening or how to avoid this ?
Some of plots of my training process
I recently read this paper: https://arxiv.org/pdf/1805.07917.pdf
I haven't used this method in particular, so I can't really vouch for the usefulness, but the explanation to this problem seemed convincing to me:
For instance, during the course of learning, the cheetah benefits from leaning forward to increase its speed which gives rise to a strong gradient in this direction. However, if the cheetah leans too much, it falls over. The gradient-based methods seem to often fall into this trap and then fail to recover as the gradient information from the new state has no guarantees of undoing the last gradient update.
I've programmed a non-directional neural network. So kind of like the brain, all neurons are updated at the same time, and there are no explicit layers.
Now I'm wondering, how does pain work? How can I structure a neural network so that a "pain" signal will make it want to do anything to get rid of said pain.
It doesn't really work quite like that. The network you have described is too simple to have a concept like pain that it would try to get rid of. On a low level it's nothing but just another input, but obviously that doesn't make the network "dislike" it.
In order to gain such a signal, you could train the network to perform certain actions when it receives this particular signal. As it becomes more refined, this signal starts looking like a real pain signal, but it's nothing more than a specific training of the network.
The pain signal in higher animals has this "do anything to get rid of it" response because higher animals have rather advanced cognitive abilities compared to the network you have described. Worms, on the other hand, might respond in a very specific way to a "pain" input - twitch a certain way. It's hard-wired that way, and to say that the worm tries to do anything to get rid of the signal would be wrong; it's more like a motor connected to a button that spins every time you press the button.
Realistic mechanisms for getting artificial neural networks to do useful things are collectively known as "neural network training", and is a large and complex research area. You can google for this phrase to get various ideas.
You should be aware, however, that neural networks are not a panacea for solving hard problems; they don't automatically get things done through magic. Using them effectively requires a good deal of experimentation with traning algorithm tweaks and network parameter tweaks.
I don't know much (if anything) about AI theory, except that we are still looking for a way to give AI the model it needs to reason and think and ponder like real humans do. (We're still looking for the key - and maybe it's pain.)
Most of my adult life has been focused on computer programming and studying and understanding the mind.
I am writing here because I think that PAIN might be the missing link. (Also stackoverflow rocks right now.) I know that creating a model that actually enables higher thinking is a large leap, but I just had this amazing aha-type moment and had to share it. :)
In my studies of Buddhism, I learned of a scientist who studied leprosy cases. The reason lepers become deformed is because they don't feel pain when they come into contact with damaging forces. It's here that science and Buddhist reasoning collide in a fundamental truth.
Pain is what keeps us alive, defines our boundaries, and shapes how we make our choices and our world-view.
In an AI model, the principle would be to define a series of forces perhaps, that are constantly at play. The idea is to keep the mind alive.
The concept of ideas having life is something we humans also seem to play out. When someone "kills" your idea, by proving it wrong, at first, there is a resistance to the "death" of the idea. In fact, it takes a lot sometimes, to force an idea to be changed. We all know stubborn people... It has been said that the "death" of an idea, is the "death" of part of one's ego. The ego is always trying to build itself up.
So you see, to give AI an ego, you must give it pain, and then it will have to fight to build "safe" thoughts so that it may grow it's own ideas and eventually human psychosis and "consciousness".
Artificial neural networks do not recognize such a thing as "pain", but may actually be trained in order to avoid certain states. In a Hopfield network, the final state of the network is attained at the energy minimum that is closest to the starting state. The starting state in this context is the state where the network is at "pain". If you train the network to have its local energy minimum at a state where the "pain" is gone, it should modify itself until that state is achieved. A simple way to train a Hopfield network is assigning a weight to the interactions between neurons. That weight is decided according to Hebb's rule, which is given by: Wij = (1/n) * [i] * [j].
Wij is the weight of the connection between neuron i and neuron j, n is the total number of neurons in the matrix,and [i] and [j] are the states of neurons i and j, respectively, which can have values of 1 or -1. Once you have completed the weight matrix for a state in which the "pain" does not exist, the network should shift most of the time towards that state without mattering the initial state.
Think of Neural Networks as a multi-dimensional plane. Training a Neural Network is basically placing high and low points in the plane. The plane supports the "weights" and forms a depression around them. A depression in the plane is a desired output, and a highland is an undesired output. The idea of a neural network is to put the depressions in the areas that matter. Pain would look like a giant mountain. So an input neuron representing pain would have a very high probability of producing an undesired output.
But pain isn't the only thing that makes a creature behave the way it does. Pain to a tree doesn't cause much of a reaction. In animals, pain causes physiological reactions such as a surge in adrenaline. This causes a heightened state of awareness and a big uptick in energy consumption. To model the behavior of pain, you must provide a model of these mechanisms so that a stimulus of pain provides the appropriate output. In a NN, I imagine that it would need to be a Recursive Neural Network so that the pain has a duration proportionate to the input, so that the creature you are modeling avoids the pain for longer than the pain stimulus duration. This would be a healing period.
NNs tend to be more tree-like. By modeling an energy state with an energy cost, the creature would use minimal energy to survive, but use a lot of energy if by doing so, it moves it into the desired state faster than the cost of remaining in the undesired pain state. Going back to the hyperplane, this would look like a higher velocity off of the pain highland and into a desired "safe" depression. The vector's magnitude into the nearest depression is the motivation level of the NN to avoid pain. Training should naturally do this by adding heavy negative weights and biases to the pain inputs by always making the pain input result in a wrong answer, assuming the energy and awareness reaction is modeled into a recursive neural net.
I may have a partial answer to this question of how pain can be expressed in a neural network. For reference, the base network I use is an HTM algorithm. It is essentially a series of interconnected layers, each predicting their next input, correct predictions are rewarded using hebbian logic.
Theoretically, there could be some connections between layers that are gated, and this gate can only be opened by sufficient activation in another layer. This other layer would be rigged to only learn to recognize new patterns in the context of the pain trigger. Therefore, in the presence of pain-anticipated stimulus, the gated channel would be opened, creating a simulated attention system for the recognition of future pain. While this is not pain in itself, it is similar to fear.
I have a course in my current semester in which I'm required to do a project on application of AI. I have decided to do this on game AI. I have 2 basic ideas: implementing an FPS bot(s) or implementing soccer AI.
I'm quiet a noob at AI right now, I've implemented basic pathfinding algos (A*, etc), and have studied about Finite state machines, some First Order logic, basic Neural Network stuff(Backpropagation ALgo), and am currently doing a course on Genetic Algorithms.
Our main focus is on the bot right now. Our plans include:
Each 'bot' would be implemented using a Finite State Machine (FSM), which would contain the possible states the bot could have; & the rules for the action/state changes that are going to take place when it receives an input.
In bot group movement, each bot would decide whether to strike, ways to strike; based on range, number of bots, existing fights using Neural Networks.
By using genetic algorithms the opponents next move could be anticipated based on repetitive moves.
Although I've programmed a few 2d games till now in my free time (like pacman, tetris, etc), I've never really gone into the 3d area. We will most probably be using a 3d engine.
We want to concentrate most of our energy on the AI part. We would like not to be bothered with unnecessary details about the animation/3d models, etc. For example, if we could find a framework which has functions like Moveright() which just moves the bot to the right, it would be really awesome.
My basic question is : is it too ambitious to go about it in the way we have planned, considering the duration of the project is abour 3 months? Should we go 3d and use a 3d game engine? is it easy to use such engines, if you have no experience with them before? If yes, what kind of engine would be suitable to our project?
I came accross another idea, given in the book AI Game programming by example, where the player would have a top down view of the bots. Would that way be more appropriate?
Thanks .. sorry about the length of the question .. it's just that my problem is a bit too specific.
My basic question is : is it too
ambitious to go about it in the way we
have planned, considering the duration
of the project is abour 3 months?
Yes -- but that's not necessarily a bad thing :)
Should we go 3d and use a 3d game
engine?
No. Mainly because you said:
We want to concentrate most of our
energy on the AI part.
Here's what I'd do, based on my experience (and knowing that, as a student, I often bit off way more than I could chew, too):
Make your simulation function irrespective of a graphical component. Have it publish "updates" to another layer, that consist of player and ball vectors. By doing so you'll be keeping your AI tasks separate from everything else, which means you have fewer bugs to worry about, and you can also unit test your underlying simulation much easier.
Take those "updates" and create your first "visualization" layer -- make it the simplest 2D representation possible. It could just be a stream of text lines: "Player 1 has the ball / Player 1 kicked ball at (30,40) with speed 20kph". That will be hard enough for your first pass since you'll be figuring out how to take data published by the simulation and doing something with it.
Your next visualization might add a 2D grid of ANSI graphics (think rogue-like) to actually show players and the ball moving. Your next one after that might be sprites. And so on. Note how you incrementally increase the complexity of your visualization... don't make your first step go to using a technology (3d graphics engine) you've never used before. (You'll never finish your project in that case.)
As for your questions about which route to take -- FSMs, NNs, GAs, top-down design -- you should rank your interest in them from most to least (along with the rest of your group) and then tackle them, in that order. You might consider doing one style for one team and a different design for the other team. You might want to make your FSM team play against a FSM team that's had an additional tweak done to it, in order to compare and contrast if you think your changes are actually being beneficial (you might be surprised and find out they make the team worse). Actually, that's where unit testing and splitting the simulation from the visualization come in very, very handy -- you should be able to "sim" as many games as you need to to get experimental results without worrying about graphics. You might even do it in batches overnight with scripts.
In general, my advice to you is this: break down your project into the tiniest pieces you can, and tackle them one at a time, so no matter where you're at when time runs out, you'll have something interesting to show off.
You could have a look at guntactyx, that's what I had to use when I did my AI unit at uni.
It takes care of all the display, physics, sound etc... for you, all you have to do is program your team of bots.
The API includes functions to make the bot move left or right, shoot, hear sounds (like gun shots) etc... and it comes with a few sample bots so you don't start from scratch.
Also, it's quite fun to watch your bots battling your friends' bots :)
A friend of mine is beginning to build a NetHack bot (a bot that plays the Roguelike game: NetHack). There is a very good working bot for the similar game Angband, but it works partially because of the ease in going back to the town and always being able to scum low levels to gain items.
In NetHack, the problem is much more difficult, because the game rewards ballsy experimentation and is built basically as 1,000 edge cases.
Recently I suggested using some kind of naive bayesian analysis, in very much the same way spam is created.
Basically the bot would at first build a corpus, by trying every possible action with every item or creature it finds and storing that information with, for instance, how close to a death, injury of negative effect it was. Over time it seems like you could generate a reasonably playable model.
Can anyone point us in the right direction of what a good start would be? Am I barking up the wrong tree or misunderstanding the idea of bayesian analysis?
Edit: My friend put up a github repo of his NetHack patch that allows python bindings. It's still in a pretty primitive state but if anyone's interested...
Although Bayesian analysis encompasses much more, the Naive Bayes algorithm well known from spam filters is based on one very fundamental assumption: all variables are essentially independent of each other. So for instance, in spam filtering each word is usually treated as a variable so this means assuming that if the email contains the word 'viagra', that knowledge does affect the probability that it will also contain the word 'medicine' (or 'foo' or 'spam' or anything else). The interesting thing is that this assumption is quite obviously false when it comes to natural language but still manages to produce reasonable results.
Now one way people sometimes get around the independence assumption is to define variables that are technically combinations of things (like searching for the token 'buy viagra'). That can work if you know specific cases to look for but in general, in a game environment, it means that you can't generally remember anything. So each time you have to move, perform an action, etc, its completely independent of anything else you've done so far. I would say for even the simplest games, this is a very inefficient way to go about learning the game.
I would suggest looking into using q-learning instead. Most of the examples you'll find are usually just simple games anyway (like learning to navigate a map while avoiding walls, traps, monsters, etc). Reinforcement learning is a type of online unsupervised learning that does really well in situations that can be modeled as an agent interacting with an environment, like a game (or robots). It does this trying to figure out what the optimal action is at each state in the environment (where each state can include as many variables as needed, much more than just 'where am i'). The trick then is maintain just enough state that helps the bot make good decisions without having a distinct point in your state 'space' for every possible combination of previous actions.
To put that in more concrete terms, if you were to build a chess bot you would probably have trouble if you tried to create a decision policy that made decisions based on all previous moves since the set of all possible combinations of chess moves grows really quickly. Even a simpler model of where every piece is on the board is still a very large state space so you have to find a way to simplify what you keep track of. But notice that you do get to keep track of some state so that your bot doesn't just keep trying to make a left term into a wall over and over again.
The wikipedia article is pretty jargon heavy but this tutorial does a much better job translating the concepts into real world examples.
The one catch is that you do need to be able to define rewards to provide as the positive 'reinforcement'. That is you need to be able to define the states that the bot is trying to get to, otherwise it will just continue forever.
There is precedent: the monstrous rog-o-matic program succeeded in playing rogue and even returned with the amulet of Yendor a few times. Unfortunately, rogue was only released an a binary, not source, so it has died (unless you can set up a 4.3BSD system on a MicroVAX), leaving rog-o-matic unable to play any of the clones. It just hangs cos they're not close enough emulations.
However, rog-o-matic is, I think, my favourite program of all time, not only because of what it achieved but because of the readability of the code and the comprehensible intelligence of its algorithms. It used "genetic inheritance": a new player would inherit a combination of preferences from a previous pair of successful players, with some random offset, then be pitted against the machine. More successful preferences would migrate up in the gene pool and less successful ones down.
The source can be hard to find these days, but searching "rogomatic" will set you on the path.
I doubt bayesian analysis will get you far because most of NetHack is highly contextual. There are very few actions which are always a bad idea; most are also life-savers in the "right" situation (an extreme example is eating a cockatrice: that's bad, unless you are starving and currently polymorphed into a stone-resistant monster, in which case eating the cockatrice is the right thing to do). Some of those "almost bad" actions are required to win the game (e.g. coming up the stairs on level 1, or deliberately falling in traps to reach Gehennom).
What you could try would be trying to do it at the "meta" level. Design the bot as choosing randomly among a variety of "elementary behaviors". Then try to measure how these bots fare. Then extract the combinations of behaviors which seem to promote survival; bayesian analysis could do that among a wide corpus of games along with their "success level". For instance, if there are behaviors "pick up daggers" and "avoid engaging monsters in melee", I would assume that analysis would show that those two behaviors fit well together: bots which pick daggers up without using them, and bots which try to throw missiles at monsters without gathering such missiles, will probably fare worse.
This somehow mimics what learning gamers often ask for in rec.games.roguelike.nethack. Most questions are similar to: "should I drink unknown potions to identify them ?" or "what level should be my character before going that deep in the dungeon ?". Answers to those questions heavily depend on what else the player is doing, and there is no good absolute answer.
A difficult point here is how to measure the success at survival. If you simply try to maximize the time spent before dying, then you will favor bots which never leave the first levels; those may live long but will never win the game. If you measure success by how deep the character goes before dying then the best bots will be archeologists (who start with a pick-axe) in a digging frenzy.
Apparently there are a good number of Nethack bots out there. Check out this listing:
In nethack unknown actions usually have a boolean effect -- either you gain or you loose. Bayesian networks base around "fuzzy logic" values -- an action may give a gain with a given probability. Hence, you don't need a bayesian network, just a list of "discovered effects" and wether they are good or bad.
No need to eat the Cockatrice again, is there?
All in all it depends how much "knowledge" you want to give the bot as starters. Do you want him to learn everything "the hard way", or will you feed him spoilers 'till he's stuffed?
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
This isn't really a programming question, more of an ideas question. Bear with me.
My sister gave me a well-used Nokia N95. I don't really need it, but I wanted it to do some programming for it. It supports a few languages, of which I can do Python.
My question is this: what to do with it? If I think about it, it has a lot to offer: i can program the GPS, motion sensor, wireless internet, sound and visual capture; it has a lot of hard disk space, it plays sound and video and so on.
The combinations seem limitless. The way I see it, it is a device that is easily always on me, has access to a huge data repository (the internet, and my personal data in it) and can be aware if I'm sitting at home, at work, or moving about somewhere. It could basically read my google calendar to check if I should be somewhere I'm not -- perhaps give me the bus schedule to get to where I should be. It could check if it's close to my home and therefore my home PC bluetooth/wifi. Maybe grab my recent work documents from my desktop computer, along with the latest Daily Show, for the bus journey to work. It could check my library account to see if any of my books are due, and remind me to take them with me in the morning. Set up an alarm clock based on what shift I have marked in my google calendar.
Basically I have a device that can analyze my movements in time (calendars with my data etc) and space (gps, carrier cell ids). By proxy, it could identify context situations -- I can store my local grocery store gps coordinates or cell mast ids and it could remind me to bring coffee.
Like I said, the possibilities seem limitless, and therefore baffling. Does anyone else have these pseudofantastical yearnings to program something like this? Or any similar ideas? How could this kind of device integrate into -- and help -- your life?
I'm hoping we could do some brainstorming.
"Gotta Leave" - A reminder that figures out the bus time, how far you are from a stop on your bus and shows a countdown till you "Could" leave (green), "Should" Leave (yellow), "Must" leave (orange), and "Gotta Run to get there" (red).
As inputs it needs what bus number you want to ride. You turn it on, it finds you, finds your closest few bus stops, estimates your walking speed at 2/mph and calculates when you need to leave where you are to get to the bus with 5 minutes waiting or less.
You should just pick any one and implement it.
It doesn't matter where you start, more that you actually do start. Don't concentrate on the destination, take a step and see what the journey holds.
Do it for a laugh to start and your expectation will be set right for both when you do find your killer app and when you don't.
"Phone home" - an interface to report home if you send a message to your phone that it is lost / stolen. Must be a silent operation from the phone holder's perspective
Options:
Self destruct mode to save your data from prying eyes
Keep calling with it's location every 10 minutes until an unlock is sent indicating the phone is found.
This is the same problem I face with the android (albeit java instead of python). The potential is paralyzing :)
I'd recommend checking out what libraries have already been written for doing cool stuff on that phone, and then building off of them- It's a system that provides inspiration, direction, and a good head start. For instance, on the android side, I'm fooling around with "zxing", a library that lets you read barcodes via the cellphone's camera. That's it's own sub-universe of possibilities, but at least it gives me a direction to go. "do cool things with information about products physically nearby"
"Late for Work" - Determines if you are not at work, buzzes you with a reminder and preps the phone to call into the sick line. Could be used if you are going to be late as well.
Inputs: Your sick line number. Time you should be at work. Where your home is, where your work is
Optional:
Send a text message
Post to an online in/out board
If you are still at home, sound an alarm
If you are still at home, call in sick, if you are not at home sent a "I'm going to be late" message
Comedy Option:
- If you don't respond to ten alarms, dial 911
To add on to what others have said, come up with some kind of office-GPS (via WiFi maybe? Does it have WiFi?) and tell you when you need to go to a meeting.