Jolanda G. Tromp, Mike C. Fraser.
This paper argues that the design of spaces and objects for VE needs special attention in terms of narrative structure. Because of the freedom of navigation and interaction in VEs the flow of interaction is not as predictable as for 2D systems. To support VE user interactions we may need to develop a system for additional guidance. The development of such a system may be informed by storyboarding techniques.
Direct Manipulation, Affordances, Flow of interaction, Usability Design, Storyboarding, Narrative flow, Virtual Reality.
VEs are intended to provide three dimensional graphical representations of spaces for people to interact with. Users are enabled to freely roam through the spaces, interact with objects and other users present in the same space.
Whilst direct manipulation interfaces dynamically present 2D objects of interest, VEs provide moveable views onto 3D spaces representing places or structured objects in the attempt to create a coherent order to events and information (Kaur, 1998). Direct manipulation is concerned with allowing the user to directly act out their goals by providing them with icons which represent objects traditionally used in the real world to perform those actions (for example a paint brush icon to represent to action of painting).
Davenport & Bradley argue that we are bridging the gap between real and virtual worlds as researchers explore new forms of computer-mediated interactions among people, bits, and atoms. Designers are creating tangible user interfaces - digitallu augmented spaces, surfaces, objects, and instruments that make bits physically acessible and manipulable via graspable objects and ambient media. These interfaces emphasize visually intensive, hands-on foreground interactions and convey information subtly through our peripheral perceptions of the goings-on in the space around us. The room itself, the information that flows through it, the objects it contains, and the people engaged with it become collaborative co-actors (Davenport & Bradley, 1997, pp. 10-11).
Because of the freedom of navigation and interaction in VEs, it is difficult to predict what actions users will take, and in what order they will perform their actions. Users have been shown to struggle with finding the right order in which to perform actions, with finding their way through the environment, and navigating into precise positions (Kaur, 1997; Tromp & Snowdon, 1997, Tromp et. al.,1998b). Basically, we need to supply users with intuitively understood sequential affordances of the environment (Gaver, 1991).
Standard HCI design methods, developed for 2D interfaces, are using the notions of direct manipulation, perceptual affordances and sequential affordances to address issues of representation and user action guidance (Preece et al., 1994). Affordances are the properties of an object which determine how an object can be used (Norman, 1988). Perceptual affordances are visible controls which suggest functionality. Sequential affordances refer to the notion that acting on one perceptual affordance can lead to the perception of new information, leading to information indicating new affordances. The issue is how we can use these notions to inform the design of 3D objects and spaces for VEs.
Arnheim states that visual perception is not a passive recording of stimulus material, but an active concern of the mind. Perception involves problem solving, because everything in this world presents itself in context and is modulated by that context. When the image of an object changes, the observer must know whether the change is due to the object itself or to the context or both, otherwise he understands neither the object nor its surroundings. Intertwined though the two appear, one attempts to tease them appart, especially by watching the same object in different contexts and the same context acting on different objects. The object under observation must, then, be abstracted from its context. This can be done in two fundamentally different ways. The observer may wish to peel off the context in order to obtain the object as it is and as it behaves by itself, as though it existed in complete isolation. The observer may also wish to find out about the object by observing all the changes it undergoes and induces because of its place and function in its setting. Here the abstraction, while singling out the object, does not relinquish the effects of the context but relies on them for an indispensable part of the information. (Arnheim, 1996, pp. 36-37).
Suchman introduced the term 'situated action' to refer to the notion that every course of action depends in essential ways upon its matreial and social circumstances. Rather than attempting to abstract the action away from its circumstances and represent it as a rational plan, the approach is to study how poeple use their circumstances to achieve intelligent action. Rather than build a theory of action out of a theory of plans, the aim is to investigate how poeple produce and find evidence for plans in the course of situated action. (Suchman, 1987, p. 50). Even the specialist's skill is situationally defined. The key to being an "expert" is not "to learn everything", but to reduce the size of a "knowledge situation" until one is able to master all that is in it (Meyrowitz, 1985, p. 326).
During the COVEN Project a virtual environment has been build on top of dVS from Division Ltd. This platform provides 3D information in the general area of travel. Two Inspections (Cognitive Walkthroughs) were made of the COVEN platform (Steed et al, 1997, Tromp et al, 1998) was made to establish whether the design supported users, and if not how to improve this. Numerous remarks made by the Inspectors illustrate a certain need for guideance, which can be traced back to the lack of narrative structure in the design of the interactions. Below are few examples.
"The overall design of the rooms is not consistent in terms of realism and details. The semi-real metaphor is not fully consistent: there is a real world feeling but still some objects are floating in the air. This can sometimes be confusing. Also, some task-unrelated objects are represented (typically, radiators) - why these, why not others (lamp, plants, etc.)?"

Figure 1: View of the Tourist Information room for the island of Rhodes, Greece. On the left wall is the slide-show, in front a table, with the CDs and the CD player.
Another example is the CD player which was implemented to provide a metaphor for a slide-show accompanied with a voice-over of tourist information. These are some of the comments on the design:
"CD player is not obviously visible unless you are close to it, I find it very hard to read the CDs. I hardly can speak of a selection process; as user I just try, not clear what the choices are unless you get right in front + look down (too difficult to navigate such a move), only then can one read names of the CDs, It is not obvious to me that playing a CD would involve a slides show. When I activate a CD, I am expecting sound, not images. This is because the CD player looks like the audio CD player I have at home, not like a multimedia CD-ROM appliance, The option for playing a CD-ROM is not obvious. You come upon it when playing with the mouse. It actually comes as a surprise that a simple CD selection activates the player (not consistent with the usual behaviour of selections within the application). Risk for errors, Anyway, the show is fine. But once started, how do I stop this lady telling me about Rhodos? I cannot find a stop button."
An example of the need for guidance for the sequence of tasks is the lay-out of the main tourist information office room and the task-related or functional objects in it. The user first has to go through the door.
"Opening the door of the Meeting Room wasnt as easy as I thought. The door was slammed in my face for three times. And what a sound was accompaning it: for a moment I thought I was in Mickael Jacksons thriller! I think a user prefers that the doors opens after touched the handle and will place him in the middle of the next room."
Next, the user is expected to find the CD slideshow, but the table with the CDs and CD-player is not placed in a very prominent position. After having negotiated the CD slideshow the user is expected to take a virtual flight over the holiday destination by means of entering a teleporter, which is located in a corner of the room. However, again the object is not placed in a prominent position, and there is not a lot to guide the users interest towards this other main functional object in the room.
"It is not clear what the function of the teleporter is, unless you know what it is. The outside does not suggest anything about its functionality and Maybe a textual tag would help, e.g. "to virtual Rhodes". Or is it part of the fun to entertain mystery? Teleporter needs a label."
Finally, one ends up with problems on the level of structure in the object interaction again:
"The controls for the teleporter are not obvious since they are unlabelled. Indeed one has to open the teleporter first before these controls become apparent."
The correct performance of actions and sequences of actions in the VE greatly relies on the design of the interface of the VE and its objects. User actions in VEs oscillate between user created story-lines (i.e. the successful performance of action sequences) and user interface struggles (i.e. the inability to perceive or perform the correct action, the absence of feedback, or both). So if we could support users by creating a (possible number of) storyline(s), as if we were creating a 3D, role-play like, storybook. Here we can learn from the art of exhibition design. Exhibitions are ideally regarded as a form of sculpture: "They are three-dimensional compositions which recognise the importance of solids and voids and strive for satisfactory spatial relationships." (Lawson, 1981).Indeed, Parent notes that using a storyboard alongside a requirements analysis is a good way to prepare and evaluate the design of a virtual exhibit (Parent, A., 1998).
Davenport et al (1991) maintain that increasingly, we will need to create multimedia systems that maintain a conversational mode of interaction with users by generating and tracking story frameworks. In browsing, user's personal goals and intentions may collide with the system's ability to foster coherent meaning; viewer comprehension tends to break down, therfore, when participants begin browsing. While this breakdown is often blamed on weaknesses in the navigational tools, it is more frequently reflects the lack of semantic representation that can support the limited look-ahead functions required to build meaningful interactions with the user. Increasingly, we will need to create multimedia systems that maintain a conversational mode of interaction with users by generating and tracking story frameworks.(Davenport, Aguirre Smith, Pincever, 1991, p. 68).
In VEs we want to create easily understandable environments and guide users in their expectations. For most Virtual Environments (VEs) the choice is made to represent the environment in a realistic way in order to allow users to transfer their daily knowledge of operating in the real environment to the virtual one, based on the principle that the individual's common-sense knowledge of the world is a system of constructs of its typicality (Shutz, 1973, p8).
However, real- time image generation puts a high load on machine processing time, and in the case of networked VEs (CVEs) on the network traffic generated by the real-time image update. To reduce this load the functionality and appearance of the environment and objects is reduced to the bare minimum. Because of this simplification it is often difficult for users to predict which operations are available and which are not. Users have been shown to struggle with the interface for these reasons (Kaur, 1997; Steed & Tromp, 1998).
This 'incidentally' means that designing the affordances of objects in the VE is a constant trade-off between realism and simplification - between user needs and utilised computing resources (Tromp, COVEN del). A balance needs to be found between the essential and non-essential elements of the object, so that the user can still perceive the correct actions and functions, while the machine load is kept to the minimum.
Often this simplification results in a more or less cartoon-like representation of the VE and the objects. The objects represented should be caricatures, which ideally act as a form of amplification through simplification (McCloud, 1993). If we can reduce the representation of the object to its most salient features and functions, we can guide the user more easily to the next correct action. This technique is also know as sequential art.
Sequential art makes use of timing, framing, panels, and composition. "Critical to the success of a visual narrative is the ability to convey time." (Eisner, 1985, p. 26). It is this dimension of human understanding that enables us to regocnize and be emphatetic to surprise, humor, terror, and the whole range of human experience. The device most fundamental to the transmission of timing is the panel or frame or box. The act of panelling the action not only defines its perimeters but establihshes the position of the user in relation to the scene and indicates the duration of the event. The act of framing separates the scenes and acts as a punctuator. The lines drawn around the depiction of a scene, which act as a containment of the action or segment of the action, have as their functions tje task of separating or parsing the total statement. Where narrative art seeks to imitate reality in a meaningful chain of events and consequences and thereby evoke emphaty, the dimension of time is an inescapable ingredient. The sequential artist 'sees' for the user because it is inherent to narrative art that the requirement on the viewer is not so much analysis as recognition. The task then is to arrange the sequence of events (or images) so as to bridge the gaps in action. Given these, the viewer may fill in the intervening events from experience and by exploration. Each panel should be regarded as a stage wherin an arrangement of elements takes place. They must be arranged with a clear purpose. Nothing in a panel should be accidental or placed there casually. The primary concern in composing a scene is the center of attention. The mission is to focus on the major item or action by placing it in the area of major attention. The panel has a geometric shape and has a 'focal point' which the viewer's eye first engages before moving on to absorb the rest of the scene. Each panel has its own 'focal point' depending on its shape.
In addition to exploring the caricatures of representations, we also need to explore the caricatures of situations. Live-action films are stripped-down versions of reality, to increase the intensity of the story, thereby guiding the viewers in their anticipation of the next action (Straczynski, 1996). If we can simplify the situation in which the user finds herself at a particular moment in the VE interactions, we can predict the next user action more easily, thereby clearing the path of the user story in the direction we would like the user to go. Effectively, each VE user creates her own story-line in a VE - of all possible actions, the user will have to select one, which then leads to the next set of choices, etc.
Throughout the evolution of cinema, directors, cinematographers, and editors sought to discover ways in which cinematic language ccould both camouflage and create a perceptual awareness of time and space. Whether match cut, or jump cut, the relationship between the last frame of one shot and the first frame of the next shot generated new theories of continuity and ideas for sequence constructions. Synchronous sound is of assistance on creating and sustaning a semantic dialogue and is of help in creating ambient sound. In motion picture parlance, syntax refers to how action and sound is framed in individual shots and how these shots can be ordered into sequences. While the audience experiences linear movies as unified entities, filmmakers experience movies as the generation of individual shots and sound elements. (Davenport, Aguirre Smith, Pincever, 1991, pp. 69-70.)
The designer will have to help the VE user identify the actions and objects necessary to perform their tasks, especially the order in which they are to be used. Some chunks of information have to be interpolated by the user, some have to be attached to the objects, before the user can make sense of the environment. The sequence of appearance is extremely important. The items of information function as elements of a story, and while the arrangement may be flexible and open, the elements have to be assembled in a particular order to make sense of the story. And that order should be designed to guide the users through their tasks.
The perceptual affordances of 3D objects in VEs need to be improved by choosing simplification of the objects so that the available functions on the objects are amplified as much as possible. The sequential affordances of a VE task, which usally involves several interactions, with multiple objects in a certain specific order. These sequential affordances need to be designed with more care. This can be achieved by more carefully designing to direct user attention from one object to the next, as desired. Generally, partial tasks can be automated, and guidance for sequential affordances can be provided by structuring the lay-out of the rooms and position of the objects more deliberately. Instead of arbitrarily positioning objects and rooms, they can be grouped and ordered into meaningful parts which intentionally draw the user from action to action. Standard HCI alerting techniques for guiding user attention to the next action, such as the use of colour, flashing, and reverse video are not very elegant solutions in a VE, especially when it concerns the design of multi-user VEs. However, the use of spatial and temporal cues and audio warnings may be much more effective. Designing the spatial lay-out of rooms and objects more carefully, and providing more carefully designed object affordances could improve the usability of VEs. Simplifying the VE by deliberately designing caricatures of objects and situations may be a more effective way of keeping machine load down, without loosing usability points.
Navigation and fine-tuned positioning can be improved by creating trenches that automatically bring a participant to the optimal path or viewing distance. All objects have and optimal viewing distance associated with them, which increases with the number of participants trying to access the same area. Participants have an optimal collaboration distance associated to their embodiments. The auto-propelling and auto- tracking properties of the furrow or trenches and personal space distancers propel the participants to their goal automatically. Participants can interrupt this automatic movement by a proportionally greater co-movement from their input device. With long intensity user-control participants are simply gravitating inside the gravity wells of the surrounding others and objects; like a slow-motion pinball in a pinball- machine.
Field of view can be subjectively enlarged by stretching the outside edge of the workspace in the outward direction only. By making the walls un-penetrable but at the same time flexible to pressures in an outward direction, the participant can increase their field of view until they can encompass the number of other participants and relevant shared objects in one view. Automatic behaviors to initiate phatic communication and signal turn-taking could be a sequence of automatic actions which occur the moment a participant enters the VE. The avatar could automatically search for the optimal place in the VE space where the participant will have the best view of the other participants. If this automatic sequence is uninterrupted by the participant the avatar could start waving automatically, go up to the nearest other active avatar and smile, when still uninterrupted it could go to the next nearest other avatar, etc. An utterance from a speaker could be automatically accompanied by slow nodding of the head, smiling, slowly turning the head from one side of the view to the other, etc. Participants intending to collaborate could make their avatars automatically assume positions oriented at an angle of about ninety degrees, where they may turn their heads to interact in face-to-face relation. Or the avatars could automatically form small triangles, squares, or circles, depending on the number of participants in the group.
Whatever the best solutions may be, it seems that Davenport and Bradley (1997), have correctly specified spearpoints for research and development: "we are going to witness a profusion of complex tool systems wherein several autnonomous tools simultaniously cooperate to assist you - in fact, they will probably compete ferociously among themselves for the priviledge. The tools of the future will be intelligent, dynamically adaptive, customizable, and personalizable to a staggering degree. With experience, they will learn and grow and wear to fit the specific craftman's hand. Their complex functionality will be deeply couched in methapor or story, and their internal operations will be hidden from view until demanded." (Davenport & Bradley, 1997, p. 9).
Arnheim, R., (1969). Visual Thinking, University of California Press, USA.
Davenport, G, Bradley, B., (1997). The Care and Feeding of Users, in: Visions and Views, IEEE. Multimedia, Jan-March, pp. 8-11.
Davenport, G., Aguirre, T., Pincever, N., (1991). Cinematic Primitives for Multimedia, in: IEEE Computer Graphics & Applications, July, special issue on multimedia, pp. 67-74.
Eisner, W., (1985). Comics & Sequential Art., Poorhouse Press, Florida.
Frohlich, D.M., (1993). The History and Future of Direct Manipulation, in: Behavior and Information Technology, V12(6), pp. 315-329.
Gaver, W.W., (1991). Technology Affordances, in: (eds.) Robertson, S.P., Olson, G.M, & Olson, J.S., Proceedings of CHI'91: Reaching through Technology, ACM, N.Y., pp. 79-84.
Kaur, K., Maiden, N., Sutcliffe, A., (1997). Interacting with Virtual Environments: an evaluation of a model of interaction, in: Interacting with Computers, Special Issue on VR.
Kaur, K., (1998).Designing Virtual Environments for Usability, PhD thesis, Centre for HCI Design, City University London, UK.
Lawson, F. (1981). Conference, convention and exhibition facilities : a handbook of planning, design and management..
McCloud, S., 1993. Understanding Comics, the invisible art, Kitchen Sink Press, NY.
Meyrowitz, J., (1985). No Sense of Place: the impact of electronic media on social behavior, Oxford University Press, New York.
Norman, D., (1992). The Psychology of Everyday Things, Basic Books, New York.
Parent, A., (1998). A Virtual Environment Task Analysis Worksbook for the Creation and Evaluation of Virtual Art Exhibits, National Research Council, XX.
Preece, J., Rogers, Y., Sharp, H., Benyon, D., Holland, S., Carey, T., (1994). Human-Computer Interaction, Addison-Wesley, New York.
Schutz, A., (1973). Collected Papers: The Problem of Social Reality, Natanson (ed.), Martinus Nijhoff, The Hague, Holland.
Steed, A., Tromp, J., Normand, V., Dijkhuis, J., (1997). Combined Usability Inspection Report, Anthony Steed (ed.), Public Deliverable COVEN ACTS Project N. AC040.
Steed, A., Tromp, J.G., (1998). Experiences with the Evaluation of CVE Applications, Proc. of Collaborative Virtual Environments 98 (CVE98), University of Manchester, 17-19 June, 1998, D. Snowdon and E. Churchill editors, pp. 123-130. .
Straczynski, J.M., (1996). The Complete Book of Scriptwriting, Titan Books, London.
Tromp, J. and Snowdon, D., (1997). Virtual Body Language: Providing appropriate user interfaces in collaborative virtual environments, in Proceedings of Symposium on Virtual Reality Software and Technology 1997 (VRST'97) September 15 - 17, 1997, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.
Tromp, J.G., Steed, A., Sandos, A., Thie, S., (1998). 2nd inspection of the COVEN Platform, in: Deliverable 3.5, COVEN ACTS Project N. AC040.
Tromp, J., Bullock, A., Steed, A., Sadagic, A., Slater, M., Frecon, E., (1998b). Small Group Behavior Experiments in the COVEN Project, in: IEEE, Journal on Computer Graphics and Applications., Vol. 18, No. 6, November/December 1998, pp. 53-63.