In order to understand our pain and our story, we’ll have to put ourselves in the position of an average IT student.
It is just after lunchtime and the students gather in front of the elevators on the ground floor to enter the lecture rooms on the higher floors.
Experience has shown that students enter the first elevator that arrives on the ground floor and squeeze until no one fits in anymore.
However, the destination floors of the students are all different.
At the same time, one or more students could be waiting at a higher floor, who would like to go up or down.
The normal elevator stops at each of the destination floors to let the students in and out.
At first glance, this may make sense, but if you consider that at peak times, such as when lectures start in the morning, groups with the same destination usually come to the elevator and fill it up, you will soon notice that this makes less sense as it is not ideal in terms of efficiency.
An ordinary elevator control would still try to take people on floors with full elevators, only to disappoint the waiting students and the students in the elevator.
The waiting students now has to wait even longer because the elevator has to close the door and leave the floor before they can push the button again as otherwise the door of the full elevator will open again.
On the other hand it can happen that a non full elevator does not stop at the floor anymore because the full one is already at the floor.
The students in the elevator also have to endure this unnecessary long travel time.
MotivationThe motivation behind the idea of controlling an elevator with an ai is on one hand to resolve the problem of high waiting times (on the floor and in the elevator) and on the other a more efficient design of the elevator control due to reduce unnecessary stopping.
Our basic conceptThe basic concept is the development of an intelligent elevator control with which we want to improve the coordination of elevators.
This control is intended to predict the travel pattern and try to increase the throughput of people in the elevator system.
However, this should not be limited to a single AI that is used for all possible scenarios.
Instead, we want to offer an individual AI for each elevator setup that is optimized for the scenario.
You might be asking yourself, why? Why individual AIs for each scenario? Why not a big overall solution.
The answer? Hardware. Developing an agent for general use would require a complex model and also more powerful hardware. We don't expect people to buy graphics cards and plug them into their elevator controller, just to be able to use our solution.
Our first attemptOur AI should be able to handle variable number of elevators, floors and passengers. Therefore we cannot rely on predefined data. We had to implement an environment for our AI to be trained with the power of reinforcement learning.
Q-LearningWhen researching the topic reinforcement learning one does most likely stumble upon the Q-Learning algorithm – and so did we. When training with Q-Learning, and agent tries to maximize the rewards it gains from performing actions within a predefined environment. By going through the state space, a state is basically a snapshot of the current environment and the state space defines all possible states within the environment, fills up the so called Q-Table. This is a lookup table where we store the maximum expected rewards for any given action at each state. By filling up this table we were able to determine the best action to choose for a given state. How well our agent performs later on depends how complete our Q-Table is.
The issue with this attempt was, that by increasing the complexity of our environment, the number of possible states would increase over exponentially. That means exploring the state space would take waaay too much time (and also memory). We observed this after adding one more passenger to the environment. At this point we knew we would need a model which makes predictions based on probabilities rather than ‘brute forcing’ its way through.
Deep Q-Learning - A new beginning!Since training with a lot of data at once can be implemented more efficiently with neural networks compared to Q-Learning, we switched our project to make use of neural networks and the technique of deep Q-Learning.
With Deep Q-Learning, the neural network receives all information of the environment as input, possibly even temporarily stored additional information such as the actions and states of the last n actions. As output an |A|-dimensional vector is expected, which contains the Q-values for all actions.
This new approach led to a huge improvement in performance and reasonability (as we could train the agent with more than one passenger for the first time).
We let the agent train with an environment consisting of two elevators, five floors and a randomly generated amount of people for many, many hours.
After one night of training the agent consistently, we were already able to see an improvement in the agents performance. It already seemed to have learned some basic logic reasoning.
We are sure, the agent would be even more successful with some more training.
ConclusionIt was a really cool project and we had a lot of fun working on this problem and overcome challenges. When implementing our second environment we could profit from the knowledge and experience of other users and the OpenAI toolkit. Additionally by using the Gym library from OpenAI, we could draw a lot of parallels to other projects when implementing our model. We learnt how to train our agent to be able to operate and navigate within a complex environment with multiple possible actions and their connected outcomes.
In the future we would like to compare the performance and throughput of our model with existing elevator systems, for example with the system in our university.
To improve our AI, the next step would be to train our system to be able to utilize empirical data based on the daily patterns of a building.
Comments
Please log in or sign up to comment.