I always forgot which problem multi-armed bandits actually refer to. I vagulely recalled the one-armed bandit as the slot machine. But I often was confused about that. As a non-native speaker, I first learned the word bandit from playing RPG games as a kid. When I learned that a slot machine can also be referred to as a one-armed bandit, I just couldn’t register that to my mind. It is nice to have google now. From teacher google, it turns out that the nickname came from the lever on the side of the old slot machine. Moreover, just like a bandit, the slot machine often empties pockets of regular folks.
Now, back to the multi-armed bandit problem, it is essentially a reinforcement learning problem with no state (or can be thought of just a single state). Think of having a number of slot machines in front of you, which one you should pick? Initially, you have no idea which machine is better. Later on, you will need to decide if you want to explore more or exploit your knowledge. So the key is to get a trade-off of exploitation and exploration.
Below is a really good lecture by Prof. Poupart on the topic.