Saturday, March 30, 2013

The Importance of Hand Tracking - Part I

Virtual hand from the SOFA project
I continue to think that hand and arm tracking is a critical component in adoption of virtual reality technologies. VR goggles that are simply a 'monitor on the head' are going to be difficult to use because interaction with the VR applications will often require the user to reach for a mouse, keyboard or game controller. This limits both the mobility of the user as well as is simply difficult to find while wearing an immersive device.

What can you do with hand tracking?

The ability to track position (x/y/z) and orientation (yaw/pitch/roll) of hands enables many things:

  • A very natural way to interact with virtual objects. Fine-grained interaction might also require understanding finger position, grab/release movements, but just being able to naturally reach out to a virtual object seems to me to be really important.
  • Another option to choose between menu items and other user-interface options. A 'yes or no' question could ideally be answered in many ways: by touching a virtual 'yes' button in the air, by nodding the head to indicate 'yes', by saying 'yes' to a microphone or even by showing a thumbs up sign. Just like you can select a 'yes or no' question on a computer screen with mouse, keyboard or speech, you should be able to do the same and more in a virtual world.
  • Locating the hands will continue to be useful in fitness or sports games as many of the Microsoft Kinect games have demonstrated. Imagine being a quarterback where your arms are tracked so the game can understand the trajectory and velocity of your throw as well as your release point

Hand tracking approaches

Historically, hand and arm tracking involved wearing body suits or wearing special markers or active sensors on the body such as this one from XSENS used to create the movie "TED":

  


An alternative approach uses an array of cameras in a reasonably sterile space such as the demonstration below from Organic Motion:


While these demos are very nice, they require lots of equipment, clean spaces and special gear. The XSENS demo uses active sensors which means a user needs a power source (e.g. batteries), needs to strap these on to the body - great for special applications but not simple and inexpensive enough for consumer use.


Tracking technologies that are certainly more consumer-friendly are those that use one or two cameras. The Kinect uses a structured-light approach which projects a known light pattern on the objects to be tracked and then analyzes it with a camera. Dual-camera solutions such as this and this are essentially 3D cameras that correlate objects across the views of cameras with a known position and then try to calculate the position of the object. Leap Motion uses three light sources and two cameras in a compact sensor. Other sensors use the time of flight method which is similar to how a radar works, to understand the distance of an object from the sensor. Below is nice demo from the Intel Perceptual Computing initiative which uses a time-of-flight sensor:


Some technologies such as the Kinect produce a depth map - essentially a representation of how far each pixel in the image is way from the camera - and then use sophisticated algorithm to turn this into a skeleton and try to understand what this skeleton means. Depending on how smart these algorithms are, the left hand may easily be mistaken for the right hand or, in worst cases, for a leg or something else.

In contrast, marker-based technologies don't extract the full skeleton but rather known locations of certain markers that are attached to specific parts of the body. Because the marker is known in advance, there is little chance that left and right hand are confused. However, using markers is a disadvantage because they have to be worn and calibrated.

Below are examples of marker tracking from WorldViz (shown together with a Sensics HMD)
 and ART (where the markers are placed on the HMD and the controller to obtain precise position and orientation.
Marker tracking on an HMD, from IEEE VR 2013

Motion tracking with a body suit, photographed at SIGGRAPH

One additional option is simply holding a tracker in your hand. For instance, our frieds at Sixense have created low-cost positioning technology that is implemented in the Razer Hydra. With it, the controller you hold in your hand is tracked. The upside is that to track your hand, you simply need to grab a controller and while doing it, you have access to various pushbuttons. The downside is that your hands are no longer free for other things. Below is a demonstration of using these controllers in building a virtual reality world:


In the next post, we will discuss the applicability of these technologies to virtual reality goggles




Sunday, March 24, 2013

Bill Davidow: "Virtual Reality is Addictive and Unhealthy"

Illustration: Eddie Guy for IEEE Spectrum
In August of last year, Bill Davidow, Partner Emeritus at the venture capital firm Mohr Davidow ventures and former SVP sales and marketing at Intel, published an opinion piece in IEEE Spectrum magazine called "Virtual Reality is Addictive and Unhealthy". Among other things, Mr. Davidow writes that:
"... I see people walking down the street, eyes fixed on the screens of their mobile phones, ears plugged into their iPods, oblivious to their surroundings…to reality itself. They are not managing their tools; their tools are managing them. Tools now make the rules, and we struggle to keep up."
The focus of the article is not virtual reality as in "virtual reality goggles" but rather the extra dimension of cyberspace that provides the ability to immediately access information from all corners of the earth. But, given its title, it's fair to ask about goggles "are they also addictive and unhealthy"?

Addictive? Probably. Just like video games are addictive, video games that are turbocharged by immersive virtual reality goggles are at least as compelling as the equivalent 'flat screen' experience.

Unhealthy? Maybe too soon to tell. One could address this on several different time-scales:

  • Minutes: a virtual reality goggle that presents a virtual world with a bad head-tracking system can cause nausea within minutes. I think we are just starting to scratch the surface of understanding how to create  pleasant experience in VR goggles. How does the noncontinuous nature of objects refreshed on the screen impact our perception (see Michael Abrash's blog post for one such angle)? How should 3D hand-held controllers be represented in virtual space? Once a feeling of discomfort arises, does it go away after a few more minutes? Are our brains 'perfectly elastic' in this sense, or will there be lingering a effect?
  • Hours: what are the risks of playing an immersive game for hours on end? Physical injury is one risk. Side effects to a 3D experience should also be considered.
  • Weeks and beyond: what are the long-term effects of being immersed in goggles? Most VR applications today limit exposure to an hour or less. Even an intense soldier training session will not last hours and hours. What happens when people start using these devices over long periods of time in an unsupervised environment?
Of course, there is the social - or anti-social - aspect of goggles. To me, this is particularly relevant for augmented reality goggles such as the upcoming Google Glass. How do you feel speaking with someone wearing the goggles when you are unsure if they are paying attention? As Davidow notes:
" the quickest way to end a deep and meaningful conversation was to glance at your watch. What would he say today about our ever more tempting smartphones?"
At least with a smartphone, you usually know when your conversation partner glances at it. With goggles this will be much harder to tell.

Thursday, March 21, 2013

The Barriers for Consumer Virtual Reality may not be What You Think

I'm back from the IEEE VR conference in Orlando, FL, where I've had a chance to speak with many experts regarding perceived barriers to virtual reality adoption in the consumer market.

The need has always been there and almost any person has ideas on what virtual reality goggles can be used for: games, architecture, relaxation, entertainment, rehabilitation and many others. For many years, movies have depicted this as well.

So what is the barrier? Why has it not happened until 2013 and given recent developments, what are the current barriers?

For a long time, the barrier was assumed to be hardware. Solutions such as this, this, this and this were hurt by some combination of low resolution, narrow field of view, lack of head tracking and price. Not every product suffered from every deficiency, but taken as a whole, the experience was just not compelling enough to justify the price for many people. I remember buying one of these products and bringing it home for my kids. After a few minutes of "wow, it's cool!", they put it down and never used it again. The experience was not compelling enough for them, even though they did not have to pay a dime to get it. From what I've learned since, this was typical.

Then, high-resolution smartphone displays came along and VR goggles with wide field of view can be easily created, even as DIY project. Granted, the resolution of these goggles is currently low, and given the wide field of view the pixel density is bad (e.g. grainy pixels unless blurred by so-so optics), but still they are good for games and other applications where such a goggle can provide the thrill of immersion. Over time, higher and higher smartphone displays will become available, overcoming the grainy image. Improved optical designs might also choose to increase pixel density be somewhat lowering the field of view, so alternatives would exist.

Let's thus assume the hardware issue is either solved or at least on the path to being solved. Would the next barrier be content or software? Probably not. There are plenty of 3D videos and an increasing number of games and game engines that support 3D stereo viewing on such smartphone-based goggles. Sure, there are some adjustments to be made such as location and size of menus, but these are tactical and not strategic problems.

A bigger problem is the one of how a user - a gamer in this example - interacts with the game. An immersive goggle hides the outside world. Now you need to find the game controller or keyboard, make sure your hands stay on them and hope that you don't become sea-sick from the disconnect between what your brain and eyes experience and what your body feels. First-person shooter games need to change different aspects of the interaction and come up with a comfortable control mode. These are still unclear. For instance, when Valve launched a version of Team Fortress 2 for VR, they offered various control modes:

  • 0: aiming and steering with your face, the mouse just rotates your “hips”. This is a good mode for use with a control pad.
  • 1: aiming with your face, steering only with the mouse. This mode may be buggy and “drift” after a while.
  • 2, 3, 4: slightly different versions of aiming with the mouse within a “keyhole” in your view. 3 is the default that TF2 ships with.
  • 5, 6, 7: assorted other experiments.

There is a lot to still figure out and, for myself and others that want to see consumer VR succeed, this is both an opportunity as well as a concern. If too many people become sick using VR games, that would not be good.

One of the speakers at the conference, David A. Smith of Lockheed Martin made the analogy between these early days of consumer VR and the early days of making movies. Initially, he said, movies were made by putting a camera at the first row of a theater and then filming the play as if you had a really good seat in the audience. Only later did the camera start playing a role in the movie - following characters around, changing angles, zooming in/out and more. Even the notion of cutting from one scene to the other came at a later time. Similarly, game control methods that were very effective in the mouse/keyboard/control pad era will need to evolve to provide effective and natural game control in the consumer VR world.

It might be that existing games simply cannot be ported well to a VR experience and that new games will have to be created. Which existing XBOX games have an effective port to using the Kinect? Are popular Kinect games those that were created specifically for the Kinect or those that were successfully morphed into Kinect games? The former, I think.

Good game control is not just about figuring out how it needs to be done. Look at the list above: it uses mouse, head, control pad. How about steering with your hands, for instance? The control scheme is only as good as the sensors that are available to it. If it was possible to sense body motion and posture, identify surrounding objects or people, integrate gestures, understand voice commands, then a more comprehensive and perhaps more successful control scheme could be developed. Thus, I believe the  barrier to consumer VR is no longer hardware but rather it is the available sensors and the way to use them to create a compelling user experience. As we've learned from the success of the Nintendo Wii, people buy experiences, not product, and in the case of the Wii, the experience was fueled by innovative sensors and an ingenious game design that took advantage of these sensors.



Tuesday, March 12, 2013

Limiting the physical risk in virtual and augmented reality

Brian Wassom, an attorney from Michigan has a nice post on Staying Out of Trouble While Playing Augmented Reality Games. It details several concerns relating to playing augmented reality games including run-ins with law enforcement (such as when pointing the phone at a police station), physical injury and commercial-tie ins.

On the physical injury side, Brian writes:

Physical Injury.  I’ve also writtenspoken, and started discussions about the potential for AR gamers to hurt themselves while chasing digital objects in the physical world.  It turns out that this, too, has already happened–at least to both of the players to whom I spoke.  One admitted to slipping on ice and twisting their ankle; the other got themselves a bit scuffed up while searching through bushes for the exact coordinates of a resonator.
Other potential avenues for mishaps were spotted and avoided.  For example, they told me about one digital portal that was originally located in the driveway for a hospital emergency room.  This was reported to the game’s designers, who moved it out of the way.  Lessons like these should help future game designers avoid similar issues.

Players at any physical game inherently accept some risk of injury. One can twist one's ankle or get a bit scuffed up playing basketball, not just augmented reality games. Nevertheless, I think that games that understand the user's environment can reduce the risk of injuries.

Pit demo from WorldViz
For example, in one of our favorite demonstrations at trade shows, we ask an attendee to put on one of our goggles and then open a virtual pit - a large abyss in the floor that only exists in the virtual world. We then ask that person to take a step forward into the pit. 99 out of 100 times that person refuses to take the step forward, even after peeking multiple times into the real world and being convinced that the hole is not real and only appears in the virtual world. Why? Probably because we are hard-wired to avoid such obvious dangers as walking onto an abyss.

The same lesson could be applied in games. If the game is in the living room and the nearby sofa - an obstacle - is shown as a brick wall, most people just won't try to run into it. Holes, walls, pointed swords, moving cars could all be used to enforce the 'do not go there' message.

Of course to do this the VR system would need real-time understanding of the surrounding objects - where as the sofas and walls, and where is the user relative to them. We call this context and believe it will become critically important to VR and AR experiences, both as a way to avoid injuries as well as to provide a dramatically more compelling experience.

Tuesday, March 5, 2013

New Study: Are There Side Effects to Watching 3D Movies? A Prospective Crossover Observational Study on Visually Induced Motion Sickness

A new study has been recently published in the PLOS ONE scientific journal:

PLOS ONE: Are There Side Effects to Watching 3D Movies? A Prospective Crossover Observational Study on Visually Induced Motion Sickness:

The study examines the impact of watching 3D movies on inducing various types of simulator sickness: nausea, oculomotor (the nerve the controls most of the eye movements) and disorientation. It concludes that:
Seeing 3D movies can increase rating of symptoms of nausea, oculomotor and disorientation, especially in women with susceptible visual-vestibular system. Confirmatory studies which include examination of clinical signs on viewers are needed to pursue a conclusive evidence on the 3D vision effects on spectators.
Through questionnaires administered to nearly 500 adults, the study shows that
Viewers reporting some sickness [...] were 54.8% of the total sample after the 3D movie compared to 14.1% of total sample after the 2D movie. Symptom intensity was 8.8 times higher than baseline after exposure to 3D movie (compared to the increase of 2 times the baseline after the 2D movie)"
At some level, this study confirms that those working in the 3D and VR industries already know for a long time: a 3D experience, and even more so inside VR goggles, can be very intense and can provide significant sensory stimuli to the person. For instance, if head tracking is done incorrectly or with significant lag, a person wearing goggles can develop nausea within the first minute of use.

What is unclear from the study is what the long-term effects are. For instance, many people feel a bit dizzy after getting off a treadmill. The body perceives movement but no forward movement is actually taking place. However, this dizziness typically passes very quickly after dismounting the treadmill and does not seem to have any long-term effects. Is this also true with 3D goggles? Is the brain 'elastic' to the effects of 3D goggles in the sense that it reverts to the original state, or are there lasting effects?

As VR goggles become increasingly immersive, as sensors are added to make VR experiences engaging and compelling, how do we best address this issue?

  • What can be done to minimize the impact?
  • Should there be an acclimation period in 3D games? Just like a warm-up or cool-down period is recommended before and after physical exercise, perhaps there is some on-ramp and off-ramp to a 3D experience?
  • How can one tell if a person is more or less susceptible to these issues? The study cites the American Optometric Association and its estimate that 3-9 million Americans have problems in binocular vision.
  • What are the legal and liability implications of selling 3D goggles for consumer activity? What kind of warnings and disclaimers should be put in place?
  • Is there a way to tell - during a game - whether a user has become uncomfortable and stop or reduce the intensity level?
I suspect that as an increasing number of goggles make it into the consumer realm, this will become an important issue to be dealt with.

Sunday, March 3, 2013

Context - the next frontier in VR goggles

If VR goggles are starting to be affordable for a larger audience, what is the next challenge to be solved?

Indeed, the price barrier to virtual reality goggles is coming down. VR used to be cool yet expensive, but that is changing to just being cool. You can find decent goggles for around $500, or even build your own. Years ago, you had to spend upwards of $1000 on a good yaw/pitch/roll head tracker. Now, you can buy decent ones for about $100, and if you wanted to make your own, components will cost you just a few dollars. Sure, if you want professional-grade performance - truly high resolution, high-precision tracking, high-quality optics with very little distortion - you will still need to pay professional-grade prices. However, the high school teacher that wants some VR in the classroom, or the gamer that wants to wear some 3D on the head without breaking the bank, are starting to have viable options.

A VR goggle by itself, inexpensive as it may be, is still just a monitor on the head. To interact with the application - game, 3D paint program, operating system - you'll want to start adding sensors. The motion tracker on the goggles will tell you which way your head is turned, but that's not enough. How can I choose an option? Raise a weapon? Manipulate a virtual object? Sensors, sensors and more sensors. Perhaps a Kinect can help capture your posture; an eye tracker can figure out your gaze. A pair of motion controllers will tell you where your hands are. The list can go on and on: a body suit gives additional information on other limbs; a voice recognition chip can help with voice commands. Cameras on the goggles can identify faces, objects or markers around you. Biometric sensors can tell if you are breathing heavily, if your heart rate is up or even if you are drunk.

Sounds like a serious case of sensor overload with several problems. Some tactical problems include:

  • Complexity. How many sensors are you willing to wear in order for a game to know what it needs to know about you? Once you are done putting all the sensors on you, would you be willing to start instrumenting the gaming weapon? Putting markers on the walls? Connecting cameras in the room?
  • Which sensor to use when? A Kinect might be great at understanding your posture when you are right in front of the TV, but what happens when you walk away? How robust are sensors in various lighting conditions? What happens when one sensor breaks down? How does one determine which sensors are available and which sensors are best to use at any given time?
  • Timing and synchronization - can you read out the data from multiple sensors in a synchronized, low-latency method? Can you get them over to the host computer (game console, PC, tablet) in a wireless fashion?
Tactical problems aside, the strategic issue is that very often, what the application really needs is not the raw data but rather some higher-level understanding of the context of the user. It's nice to know that both knees are bent at 90 degrees and that the user's body is stationary, but wouldn't the application just want to know what the user is sitting down? That the gaming weapon has been raised to the shoulder? The the user is very close to the sofa in the living room and, in fact, on a collision course with it?

Sensors give us the data - unorganized facts that need to be processed. What we also really want is information - organizing, processing and putting the data in context so that it becomes useful.


Context leads us to a higher-level understanding and is highly useful. Imagine writing a game where you truly understand the user's activity (running, jumping, sitting down, on the ground, holding a steering wheel), position (1 meter away from the sofa, lifting the gaming weapon, at a heading of 45 degrees relative to another player) and even the biophysical or emotional state - tired, excited, scared.

What is needed is a framework or architecture to collect data from sensors, fuse it and generate useful information and context. A low-cost goggle with decent image quality is good news, but getting the context right is key to the next-level of immersion and interaction.