Mike Fraser and Tony Glover
Communications Research Group,
Department of Computer Science,
University of Nottingham,
Nottingham,
UK, NG7 2RD
{mcf,atg}@cs.nott.ac.uk
This paper explores the design, representation and control of embodiments within a Collaborative Virtual Environment (CVE). The analysis is expanded upon in terms of users' ability to control their virtual representation and the degree to which the embodiment conveys the user's desired actions. Two control methods are considered; an immersive interface utilizing a head mounted display; and a more traditional desktop interface. Comparisons between the design and implementation of these interfaces are discussed. Experimental analysis of the desktop invocation is discussed, and early results from the immersive evaluations are presented.
Key Words: Collaborative Virtual Environments, Embodiment, Immersive and Desktop Interfaces
1. Introduction
Whilst developers of Virtual Reality (VR) strive to achieve realism in many aspects of the interfaces that are provided, we are still a long way from achieving completely realistic and seamless interaction. This disparity can cause problems, especially in collaborative VR systems, where realistic representations may lead co-participants to expect certain human behaviours within the context of the system. In Collaborative Virtual Environments (CVEs) representation of activity to others and the effects of that representation, must be considered. The interface to the CVE can impair the reversibility of perception (see [Robertson, 1997] for discussion) available in day-to-day interaction.
This work builds on studies from a number of sources in the field of user embodiment and control in CVEs. Some work has been done in aiming for high levels of intricacy and human realism in embodiments (e.g. [Guye-Vuilleme et al., 1998]). Other studies identify interaction, rather than realism, as the key factor in embodiment design, and approach the problem accordingly. Contexts of study in this area include experience in the design and use of CVEs ([Benford et al., 1997a], [Snowdon and Tromp, 1997]) and ethnographic analyses of CVEs in use ([Bowers et al., 1996a], [Hindmarsh et al., 1998]). Literature appears to divide into two approaches: achieving realism in all relevant aspects of VR embodiment and interface; or identifying key factors impairing interaction through a virtual space to provide explicit adaptations more suited to the current technology.
Whilst it is recognised that the first approach might prove profitable with a seamless VR interface, due to network speed and technical constraints, this is not possible even with immersive technologies. Hence we adopt an initial embodiment and control design based on literature of incremental studies of interaction in CVEs. However, we also recognise that representations which are broadly humanoid might aid in association for identifying types of co-present activity from real-world experience
Designs, implementations and evaluations of two types of interface control within the MASSIVE-2 system [Benford et al., 1997b] are outlined in this paper, one using a 'desktop' VR interface, and one using a tracked-HMD interface, which will be outlined in section 3. Our representation design and its appearance to others is used as a common factor within these disparate types, and is directly governed by the activities available to the user.
2. Design
The user is provided with a range of activities within the CVE. Embodiment design is approached by considering how each of these actions or properties is represented to the user and the co-present users within the CVE. These factors have been collated in part from a review of the literature concerning studies of incremental embodiment design as identified in the introduction (e.g. [Greenhalgh and Benford, 1995], [Benford et al., 1997a], [Snowdon and Tromp, 1997], [Bowers et al., 1996a], [Bowers et al., 1996b]), and in part from anticipated representation requirements identified by the authors.
Issue |
Representation support for user activity |
Representation support for awareness of activity to co-present users |
| Positional Navigation | Support for 6 degrees of 'lower body' freedom | Appearance of motion |
| Independent Gaze Navigation | Support for 3 degrees of 'upper body' freedom | Appearance of viewpoint |
| Variable Navigation Methods | An embodiment can be flying, walking, sitting, object-centred | Representation of the method of navigation in different media (e.g. footsteps, posture) |
| Manipulation | A directional actionpoint | Representation of manipulating and indication of target |
| Speech | Transmission of real time audio input | Visual indication of speech |
| Awareness | Ability to perceive as large a non-distorted perspective of the world as possible | No external representation |
| Gesture | A viewpoint and actionpoint support a user's gesture | Support for gesture through pseudo-realistic arm representation |
| Level of presence | Ability to sleep (on distraction), capacity to die (on technical failure) | Presence failure should be discernible through embodiment alteration |
| Capabilities | Automatic user capability detection provides reflection of abilities of user | Notion of differences in manner of interface and usability |
| Personality | Users should be enabled to tailor representations to reflect their personality | Factors should enable other participants to identify embodiment differences easily |
| Environment consistency | Notion of differences in world perception | Possibilities include `fading' of embodiments for co-present participants with inconsistent views |
3. Implementation
3.1 Desktop CVE implementation
A chief problem when organising input device strategies for desktop VR is the level of control provided. The usual method is input via a keyboard and mouse to control representations of vastly complicated virtual humans, providing appearance many times more intricate than level of control. Conversely, we identify a desktop CVE representation with control based upon its properties.
Designing an interface in this way involves creating the most intuitive connections from actions allowed to possible input device uses. The limit is that activity in a three-dimensional world must be reproduced through an interface of lower dimensionality (keyboard, on-screen buttons, or mouse). Whilst it might be possible to enable a user to identify their own key-mappings, much of our work is with first-time or novice users. In these scenarios this level of flexibility becomes redundant in that the effort expended to achieve an initial control mapping is often greater than that applying to the cumulative remaining tasks. Conversely, the prototyping nature of these, still relatively early, technologies means that most experienced users are CVE developers themselves, and hence are less likely to require easy-to-use systems.
In the construction of the desktop interface, it was considered that, in order to limit the variables involved for initial evaluation, only the core CVE interface processes would be uncontrolled. Therefore the interface allowed the following activities.
Navigation
The capacity to move and look around the virtual space is provided. In terms of representation, audio cues such as footsteps being heard enable co-present users to identify movement. Positional and gaze navigation are separate, but combined to identify gaze direction.
Manipulation
During manipulation of objects, the embodiment tracks an actionpoint and the viewpoint in the direction of the object being manipulated. An line appears from the 'arm' of the embodiment to the object in question. This serves two purposes: distinguishing manipulation from a pointing gesture; and connecting embodiment to object, implying that the object is part of the representation for the period of manipulation. This intends to aid co-participants in discerning the ability to manipulate objects at a distance in MASSIVE-2.
Gesture
Standard sets of gestural capabilities have been provided in other desktop CVEs (e.g. Guye-Vuilleme et al., 1998). It was decided to avoid the approach of providing simple sets of social gestures, instead opting to test how easy gestures were to use by allowing only a simple pointing gesture, to reference an object or point in virtual space. CVE applications are often task oriented, and it was decided that object reference would represent a 'test' gesture to identify any weaknesses in this approach during experimentation. It was also decided that the 'actionpoint' [Benford et al., 1997a] used in object manipulation and reference of objects should be a pseudo-humanoid arm, in order to enable identification of the procedure more clearly
Speech
Speech remains largely unchanged from current MASSIVE-2 implementation, which includes spatialisation and left-right pan algorithms to determine point of origin. In terms of representation to others, 'speech bubbles' and moving mouths have been utilised to indicate the audio source embodiment.
Active Awareness
Peripheral awareness needs to be explicitly supported in a desktop interface due to the narrow field of view (typically 60 degrees). In the past, MASSIVE and MASSIVE-2 have used out-of-body camera views [Greenhalgh and Benford, 1995]. These were included in the desktop design. The use of perspective-distorted 'peripheral lenses' [Robertson et al., 1997] is being explored as an alternative method. This approach will be outlined in greater detail in the experimental section of this paper.
3.2 Immersive-tracked CVE implementation
Interaction within the immersive environment is achieved by having the user's head and hand positions/orientations tracked by a Polhemus tracking system. The user is free to convey meaning by articulating within the physical environment and these actions are then described in the virtual space by an appropriate articulated embodiment, whose arms and head attempt to mirror the corresponding physical motion.
The general requirements of a desktop interface [Bowman and Hodges, 1995, Bowman, Hodges and Bolter, 1996] to a CVE are also provided by the immersive implementation. This was achieved for evaluation in the following manner:
Navigation
The ability to move the user around their local environment is achieved by them walking around within the range of the tracking equipment. Grosser navigation is achieved by pointing both arms in the direction of travel.
Manipulation
The MASSIVE-2 system already has the capability to manipulate objects via the mouse. However, this option is not viable in an HMD. Therefore the option decided upon involves the user pointing at the desired object in the virtual environment and a "ray" extends from the virtual hand to indicate which object is being interacted with, this is a similar technique to that outlined in [Bowman and Hodges, 1997]. An alternative input device has to be provided to users so that they may select objects and takes the form of a joystick with numerous buttons so that a particpiant may select an object and manipulate it whilst the joystick button remains depressed.
Gesture/Gaze
Gesturing is a feature of the immersive virtual embodiment. Owing to the fact that the user's hands are tracked, their virtual limbs may be articulated to give a degree of gestural capability. This can be used to point at objects and give obvious feedback, or provide more social inferences that convey different meaning (such as waving, clapping, etc.).
Looking is once again an obvious by-product of using an immersive interface in that the user turns their head in the direction in which they wish to look and other participants see the respective head movements. This feature can also be used to convey more social meaning, such as nodding or shaking of the head.
4. Experiments
4.1 Desktop experiments
Experimental evaluations were performed using a simplified version of the new representation and the desktop CVE implementation (fig. 1) to identify potential strengths and weaknesses of this approach. The interface simplification involved the removal of certain properties irrelevant to our experimentation, notably customisation and level-of-presence indicators. These appeared irrelevant to our subjects, and enabled a more shallow learning curve. Also included was a reporting facility to identify to the user which objects within the world they were currently manipulating or pointing at. This was intended to increase the feedback to the user about their current embodiment activity. An over-the-shoulder camera view was also enabled for the same reason. This had the dual effect of increasing the participants' field of view, and therefore their awareness of co-participants' activities.

Analysis of the experiments was achieved through the ethnographic study of situated action. The use of ethnography in evaluating, and informing design of, communication systems derives from impressive precedents set in the field of Computer Supported Co-operative Work (CSCW) (e.g. [Bowers et al., 1996a],[Bentley et al., 1992]). Detailed results from these experiments are identified in detail in [Hindmarsh et al., 1998]. However, three main points were raised, as follows.
One of the main points to arise in this evaluation was that what a user is perceived to be doing within the virtual world, and what a user is actually doing in the real world must be as synonymous as possible. One example might be that if a user is perceived to be realistically "human", then that user's interface properties should be equally realistically "human". This also might extend to other domains such as the differences in physical properties of the real and virtual and level-of-presence issues.
Suggested approaches to tackle the problems stemming from these analyses include:
For the last reason outlined above, recent experiments of an identical sort have been performed using the HMD-Polhemus tracked interface described in section 3.2. The experiments were kept as similar as possible, given the constraints of working with an HMD (e.g. motion sickness), in order that comparisons might be drawn. Although data has only recently been acquired, the following section outlines our early impressions and comparisons using this method.
4.1 Immersive experiments and comparisons
For the second set of experiments an identical virtual world and similar tasks were utilised (see [Hindmarsh et al., 1998] for details), in order to justify comparative analysis.

Although detailed evaluations for these experiments will be published in future work, initial observations have been taken to identify contrasts with our desktop implementation.
5. Conclusions
This paper has described a comparison between design, implementation and evaluation of two CVE interface styles: a standard desktop interface controlled through mouse and keyboard, and an immersive interface utilising an HMD and Polhemus tracking devices. A common design framework and subsequent implementations were outlined, and experimental analyses described. Evaluations and early results indicate advantages and disadvantages to both methods, and suggest further comparitive interface analyses are required in these, and other interface styles (e.g. projected displays and CAVE's). We have suggested that indications of realism may cause assumptions and fragmentation of interaction, and that these disparities should be considered when designing embodiments and interfaces to collaborative virtual systems.
Acknowledgements
Our thanks go to Steve Benford, Christian Heath and Jon Hindmarsh who provided invaluable comments and input during this work.References:
[Benford et al., 1997a] Benford, S., Bowers, J., Fahlen, L., Greenhalgh, C. and Snowdon, D., Embodiments, Avatars, Clones and Agents for Multi-user, Multi-sensory Virtual Worlds, in ACM Multimedia Systems 5, 2, March 1997
[Benford et al., 1997b] Benford, S., Greenhalgh, C. and Lloyd, D., Crowded Collaborative Virtual Environments, in Proc. ACM Computer-Human Interaction (CHI'97), 1997
[Bowers et al., 1996a] Bowers, J., Pycock, J. and O'Brien, J., Talk and Embodiment in Collaborative Virtual Environments, in Proc. ACM Computer-Human Interaction (CHI'96), 1996
[Bowers et al., 1996b] Bowers, J., O'Brien, J., and Pycock, J., Practically Accomplishing Immersion: Co-operation in and through Virtual Environments, in Proc. ACM Computer-Supported Cooperative Work (CSCW'96), 1996
[Bowman and Hodges, 1997] Bowman, D. and Hodges, L., An Evaluation of Techniques for Grabbing and Manipulating Remote Objects in Immersive Virtual Environments, in Proc. 1997 Symposium on Interactive 3D Graphics 1997, pp 35-38
[Bowman and Hodges, 1995] Bowman, D. and Hodges, L., User Interface Constraints for Immersive Virtual Environment Applications. Graphics, Visualisation, and Usability Center Technical Report, GIT-GVU-95-26
[Bowman, Hodges, Bolter, 1996] Bowman, D., Hodges, L. and Bolter, J. The Virtual Venue:User-Computer Interaction in Information-Rich Virtual Environments. Graphics, Visualisation, and Usability Center Technical Report, GIT-GVU-96-22.
[Hindmarsh et al., 1998] Hindmarsh, J., Fraser, M., Heath, C., Benford, S. and Greenhalgh, C., Fragmented Interaction: Establishing Mutual Orientation in Virtual Environments, to appear in Proc. ACM Computer-Supported Cooperative Work (CSCW'98), November 1998
[Robertson, 1997] Robertson, T., Co-operative Work and Lived Cognition: A Taxonomy of Embodied Actions, in Proc. European conference on Computer-Supproted Cooperative Work (ECSCW'97), 1997
[Guye-Vuilleme et al., 1998] Guye-Vuilleme, A., Capin, T., Pandzic, I., Thalmann, N., Thalmann, D., Nonverbal Communication Interface for Collaborative Virtual Environments, in Proc. Collaborative Virtual Environments (CVE'98), 1998
[Robertson et al., 1997] Robertson, G., Czerwinski, M. and van Dantzich, M, Immersion in Desktop Virtual Reality, in Proc. ACM User Interface Software Technology (UIST'97), 1997
[Snowdon and Tromp, 1997] Snowdon, D. and Tromp, J., Virtual Body Language: Providing Appropriate User Interfaces in Collaborative Virtual Environments, in Proc. ACM Virtual Reality Software and Technology (VRST'97), 1997