3. Started as a $30,000 prototype Vision: Shift the world from thinking “We need to understand technology” to " Technology needs to understand us "
21. How does Kinect know what I do? “ Xbox?!” “ Let’s Play!”
22. Microsoft Research: Object Recognition J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost : Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation. European Conference on Computer Vision, 2006
I’d like to introduce to you Kinect for Xbox 360 Where YOU are the controller. No gadgets, no gizmos, just you! Kinect brings games and entertainment to life in extraordinary new ways without using a controller. Imagine controlling movies and music with the wave of a hand or the sound of your voice. With Kinect, technology evaporates, letting the natural magic in all of us shine. http://www.xbox.com/en-US/kinect
A few inspiration points from the creators of Kinect.
So who likes playing video games? Who thinks gaming controllers are really easy to use? How long do you think it would take for you to become an expert at all of these buttons and win games? If you could just turn on the game and play and be pretty good at the game, do you think you’d probably play more video games? The purpose of Kinect is to make XBox more accessible to a broader audience. The Kinect team focused on making XBox so easy to use that anyone could jump in and play and not have to worry about reading any instructions or learning all the different controller buttons and permutations to be great at the game. They wanted to make beginners feel like experts. Kinect is designed so anyone can play, whether they are a kid, an adult, no matter how much gaming experience you have, how old you are -- you can jump in a play right away. Imagine your little brother or sister, or your grandparents trying to play an Xbox game without having to learn which button does what?
So, as we said in the last slide, instead of learning all the right buttons to click on the console, make the game understand YOU. That’s Kinect! Make gaming more accessible. Open up gaming to others. Use what you know. Don’t need to learn. But there’s also another unique element to Kinect and that is making gaming more social. Traditionally you would have your hard core gamers sitting alone in front of their game with their console firing away at the next alien, racing away in their own world for hours, etc. With Kinect, gaming is actually bringing people together in a fun, collaborative way, where watching your friends and family play is actually really entertaining. And playing with others using Xbox Live is a very social gaming experience. People are laughing and joining in even if they aren’t playing, so much that they want to get up and play themselves.
What is Kinect? Let’s start with the name… Where did the name Kinect come from? “kinetic” which means to be in motion, and "connect" meaning it "connects you to the friends and entertainment you love”! Kinect has Voice Recognition Kinect uses four strategically placed microphones within the sensor to recognize and separate your voice from the other noises in the room, so you can control movies and more with your voice. Kinect has Gesture Recognition , through a Motion Sensor Kinect uses a motion sensor that tracks your entire body. So when you play, it’s not only about your hands and wrists. It’s about all of you. Arms, legs, knees, waist, hips and so on. It also includes Skeletal Tracking As you play, Kinect creates a digital skeleton of you based on depth data. So when you move left or right or jump around, the sensor will capture it and put you in the game. Kinect has Facial Recognition Kinect ID remembers who you are by collecting physical data that’s stored in your profile. So when you want to play again, Kinect will know it’s you, making it easy to jump in whenever you want. In a nutshell – YOU Recognition!
What is Kinect? Let’s start with the name… Where did the name Kinect come from? “kinetic” which means to be in motion, and "connect" meaning it "connects you to the friends and entertainment you love”! Kinect has Voice Recognition Kinect uses four strategically placed microphones within the sensor to recognize and separate your voice from the other noises in the room, so you can control movies and more with your voice. Kinect has Gesture Recognition , through a Motion Sensor Kinect uses a motion sensor that tracks your entire body. So when you play, it’s not only about your hands and wrists. It’s about all of you. Arms, legs, knees, waist, hips and so on. It also includes Skeletal Tracking As you play, Kinect creates a digital skeleton of you based on depth data. So when you move left or right or jump around, the sensor will capture it and put you in the game. Kinect has Facial Recognition Kinect ID remembers who you are by collecting physical data that’s stored in your profile. So when you want to play again, Kinect will know it’s you, making it easy to jump in whenever you want. In a nutshell – YOU Recognition!
Build out this slide – - Kinect knows what to do - The camera captures you and your movements, voice, etc. - It’s programmed to analyze images, look for basic human form and identify about 32 essential body parts such as your head, torso, hips, knees, elbows and thighs. - Create your Avatar - You’re ready to play!
Let’s have a look at the Kinect Sensor. What are those things on the sensor? There’s a RGB camera, a depth sensor and a multi-array microphone. When you first start up Kinect, it reads the layout of your room and configures the play space you'll be moving in. Then, Kinect detects and tracks 32 points on each player's body, mapping them to a digital reproduction of that player's body shape and skeletal structure, including facial detail. Let’s take a look at each component separately to help you understand how it all works together… [next few slides go into more detail]
An infrared projector combined with a monochrome CMOS sensor allows Kinect to see the room in 3-D (as opposed to inferring the room from a 2-D image) under any lighting conditions. Depth is determined by projecting invisible infrared (IR) dots into a room. Let’s see how that might look…(next slide)
Source: www.ros.org Depth is recovered by projecting invisible infrared (IR) dots into a room. The way the optical system works, on a hardware level, is fairly basic. A class 1 laser is projected into the room. The sensor is able to detect what's going on based on what's reflected back at it. Together, the projector and sensor create a depth map. You can see in this picture the couch is further away from the Kinect sensor than the player’s hand, so the infrared dots on the couch aren’t as bright white as those on the person. This is also very helpful when there are other’s in the room watching the game. The Kinect sensor will use the depth sensors to determine the person sitting on the couch in the distance isn’t playing the game and their movements won’t interfere with the player’s movements. 320×240 depth stream
There’s also an RGB Camera. Does anyone know what RGB means? This video camera aids in facial recognition and other detection features by detecting three color components: R ed, G reen and B lue. The "RGB camera" is referring to the color components it detects. It’s similar to the web cam you see on computers and laptops today and it’s used for the sharing memories feature of Kinect which captures pictures while you’re playing! It is also used for Video Kinect which we’ll talk about a little later. What else do you think is part of the Kinect sensor?
The sensor also has EARS!! The Multi-array microphone is an array of four microphones that can isolate the voices of the players from the noise in the room. This allows the player to be a few feet away from the microphone and still use voice controls. These microphones focus on sound we care about and throw away the noise. When you first plug in Kinect it steps through an accoustic set up. Kinect is bouncing sound and listening to how it sounds to accoustically map your room. There is also a voice recognition component of Kinect. Most voice recognition available today is push to talk. No buttons with Kinect – you can talk to the controller and it recognizes speech!
There’s also a motorized tilt. The Kinect sensor will adjust using this motorized tilt so it can recognize all shapes and sizes of players. When you first turn on Kinect, you’ll see the sensor move up and down to find the players.
Color VGA video camera - This video camera aids in facial recognition and other detection features by detecting three color components: red, green and blue. Microsoft calls this an "RGB camera" referring to the color components it detects. Depth sensor - An infrared projector and a monochrome CMOS (complimentary metal-oxide semiconductor) sensor work together to "see" the room in 3-D regardless of the lighting conditions. Complementary metal–oxide–semiconductor (CMOS) (pronounced /ˈsiːmɒs/) is a technology for constructing integrated circuits. CMOS technology is used in microprocessors, microcontrollers, static RAM, and other digital logic circuits. CMOS technology is also used for several analog circuits such as image sensors, data converters, and highly integrated transceivers for many types of communication Multi-array microphone - This is an array of four microphones that can isolate the voices of the players from the noise in the room. This allows the player to be a few feet away from the microphone and still use voice controls. What comes in the box Kinect sensor for Xbox 360 Power supply cable User's manual Wi-Fi extension cable Kinect Adventures game Color VGA Motion Camera 640 x 480 pixel resolution at 30FPS Depth Camera 640 x 480 pixel resolution at 30FPS Array of 4 microphones supporting single speaker voice recognition Put it all together with a VERY IMPORTANT piece that makes it all possible – SOFTWARE!! Kinect's software layer is the essential component to add meaning to what the hardware detects. When you first start up Kinect, it reads the layout of your room and configures the play space you'll be moving in. Then, Kinect detects and tracks 32 points on each player's body, mapping them to a digital reproduction of that player's body shape and skeletal structure, including facial details. http://electronics.howstuffworks.com/microsoft-kinect3.htm http://www.popsci.com/gadgets/article/2010-01/exclusive-inside-microsofts-project-natal Kinect Software Learns from "Experience" Kinect's software layer is the essential component to add meaning to what the hardware detects. When you first start up Kinect, it reads the layout of your room and configures the play space you'll be moving in. Then, Kinect detects and tracks 48 points on each player's body, mapping them to a digital reproduction of that player's body shape and skeletal structure, including facial details [source: Rule ]. In an interview with Scientific American, Alex Kipman, Microsoft's Director of Incubation for Xbox 360 , explains Project Natal's approach to developing the Kinect software. Kipman explains, "Every single motion of the body is an input," which creates seemingly endless combinations of actions [source: Kuchinskas ]. Knowing this, developers decided not to program that seemingly endless combination into pre-established actions and reactions in the software. Instead, it would "teach" the system how to react based on how humans learn: by classifying the gestures of people in the real world. To start the teaching process, Kinect developers gathered massive amounts of data from motion-capture in real-life scenarios. Then, they processed that data using a machine-learning algorithm by Jamie Shotton, a researcher at Microsoft Research Cambridge in England. Ultimately, the developers were able to map the data to models representing people of different ages, body types, genders and clothing. With select data, developers were able to teach the system to classify the skeletal movements of each model, emphasizing the joints and distances between those joints. An article in Popular Science describes the four steps Kinect's "brain" goes through 30 times per second to read and respond to your movements [source: Duffy ]. The Kinect software goes a step further than just detecting and reacting to what it can "see." Kinect can also distinguish players and their movements even if they're partially hidden. Kinect extrapolates what the rest of your body is doing as long as it can detect some parts of it. This allows players to jump in front of each other during a game or to stand behind pieces of furniture in the room.
In 2008 someone from Xbox called Microsoft Research. They saw the published human body tracking work highlighted on the previous slide and they said they needed a computer body tracker for one of their new Xbox Games. They talked about all of the other things they wanted this tracker to be able to do – it needed to track all body motions, it needed to be 10 times faster than real-time, it must support multiple players and it must be 3D. They asked if MSR could help them build it. Well, Microsoft Research said it couldn’t be done. But the Xbox team had some game programmers that had already been trying to develop a system that could do human body tracking. They sent a video to Microsoft Research of what they had developed and the research team was truly inspired by what they saw. So they teamed up and decided to make this work! Imagine those teams getting together – PHD’s from Microsoft Research meets Xbox gaming developers…those must have been some awkward first meetings!!
The first thing they did was collected a lot of data. Xbox sent a team of people to households in about 10 countries where they went into their living rooms and asked them to pretend they were playing on this video. They captured terabytes of information. That gave them data of different sizes of living rooms, backgrounds, different sizes of people. They then went to a Hollywood motion capture studio and asked them to generate billions of computer generated images of humans based on the many different hairstyles, clothing, different poses, lighting, shapes and sizes the team collected across the globe. They took all of this data and used it to teach the computer. See examples of the training data in the next slide. (details highlighted in this article) http://www.popsci.com/gadgets/article/2010-01/exclusive-inside-microsofts-project-natal
Here are some examples of the training data (images of different human poses). The idea was this – if they can feed the computer enough data—in this case, millions of images of people—it can learn for itself how to understand it. That saves programmers the near-impossible task of coding rules that describe all the zillions of possible movements a body can make.
http://research.microsoft.com/en-us/projects/DryadLINQ/ DryadLINQ is a simple, powerful, and elegant programming environment for writing large-scale data parallel applications running on large PC clusters. So, the painstaking task of the Xbox team (the gathering of pictures of people in many different poses) generated the massive amounts of training data. They ran this data through huge clusters of computers (shown here) where the learning “brain” of Kinect resides to “learn” the many different human body movements.
The part of Kinect that the player sees looks like a Webcam, but it’s the software inside that Microsoft casually refers to as “the Brain” that makes sense of the images captured by the camera. It’s programmed to analyze images, look for basic human form and identify about 30 essential body parts such as your head, torso, hips, knees, elbows and thighs. What's the brain thinking as it watches you jump around, swinging imaginary bats or head-butting imaginary soccer balls? As you stand in front of the camera, it judges the distance to different points on your body. Then the brain guesses which parts of your body are which. So you can see here in this image, the bold colored boxes are the probable guesses that the green square is the players head, the pink and light blue squares are the players hands, etc.
Once Kinect has determined it has enough certainty about enough body parts to pick the most probable skeletal structure, it outputs that shape to a simplified 3D avatar (you can see the avatar images on the bottom right) Then it does this all over again—30 times a second! As you move, the Kinect “brain” generates all possible skeletal structures at each frame, eventually deciding on, and outputting, the one that is most probable. This thought process takes just a few milliseconds, so there's plenty of time for the Xbox to take the info and use it to control the game. Here’s the programmers view of the different images and probabilistic matching going on to eventually give you your Kinect Avatar!
The end result = the game platform is born!
Before we start playing, let’s see what type of Play Space is recommended for Kinect. Kinect needs to be able to see your entire body. - Clear the area between the sensor and the players. - If there is only one player: Stand back 6 feet (1.8 m). - If there are two players: Stand back 8 feet (2.4 m). - Make sure that the play space is at least 6 feet (1.8 m) wide, and not wider or longer than 12 feet (3.6 m).
You’ll also need to be sure that the lighting in the room is good enough to be able to detect the players. Good lighting - Make sure your room has enough light so that your face is clearly visible and evenly lit. Try to minimize side or back lighting, especially from a window. - Illuminate players from the front, as if you were taking a picture of them. - Make sure the room is brightly lit. Poor lighting - Some lighting conditions can make it difficult for Kinect to identify you or track your movements. - For best results, avoid positioning either the players or the sensor in direct sunlight.
There are also some clothing considerations to keep in mind. As we learned earlier, the sensor is detecting points on each player’s body. If clothing is hiding any points the body, for example, a skirt may be hiding your knees, then the player may have difficulty playing. [review other bullets above]
Kinect with more than just games: With Xbox LIVE, a whole world of extraordinary entertainment experiences awaits, including streaming music, HD movies, live sporting events, Facebook, Twitter, Video chat and more. Use your voice or a wave of your hand to: - Video Kinect with others* - Manage your media gallery - Music with Last.fm* - HD movies with Zune - Get in the game with ESPN*
Here’s an example of Video Kinect. Two families: one in LA, one in Dallas talking over Kinect using Video.
The families watching a video together.
You can also navigate through HD movies with Kinect and Zune.