In this excerpt from Chapter 6 of Make It So: Interaction Design Lessons From Science Fiction (Rosenfeld Media), Nathan Shedroff and Christopher Noessel discuss how sound is used as an interface in the movies, and what lessons we can take from this to apply to real-world designs.
Figure 6.1 The Day the Earth Stood Still (1951).
After the alien Klaatu is shot by a nervous soldier, his robot companion, Gort, appears menacingly from within the landed spacecraft. Its visor slowly rises to reveal a disintegration beam, which Gort uses to begin destroying all weapons in sight, including artillery and a tank. Klaatu, wanting to de-escalate the situation, turns to Gort and shouts, “Gort! Declet ovrosco!” In response, Gort stands down, ceasing the counterattack (Figure 6.1).
Gort isn’t the first piece of technology in sci-fi to have a conversational interface. That honor belongs to the wicked robot Maria from Metropolis, but as it is a silent film, we do not get to hear the commands delivered to her. In the case of The Day the Earth Stood Still, we can study the tone, pace, and responses in real time without having to interpret from lip reading and intertitles. This makes it easier to study this primary example of a sonic interface.
What Counts?
Sound becomes part of the interface when it is an input or output for a system’s state and function. Note that this is distinguished from simple audio content, such as the music from a radio. We’ve broken out sonic interfaces into two broad categories: sonic output and voice interfaces.
Sound Effects
We commonly encounter systems that use sounds for output: status, alerts, and responses. For example, our telephones play a distinct tone for each button pressed in the numeric keypad. Alarm clocks buzz to wake us up. Cars chime to remind us to buckle our seat belts. We see similar system sounds throughout sci-fi, as well. Audiences have come to expect some kind of audio interface because it helps us understand the action in a film or TV show.
A Brief Experiential History
The ringing of a telephone was one of the first sonic interfaces common in people’s lives, but even though the telephone appeared in the late 1800s, it remained one of the few sound interfaces until well into the 1950s. Though they produced sound to deliver content, even radio and television didn’t deliberately employ sound in their interfaces until much later. Sound effects of this time were analog and mostly confined to appliances such as alarm clocks, buzzers on ovens, and bells on timers.
From the production side of things, beeps and buzzes, chimes, and tones can be added to a soundtrack along with all of the other sounds necessary in a TV show or movie, such as the click of a button or the creak of a door. The art of adding such sounds is called Foley after Jack Foley, a sound engineer who launched the field in 1927, and is surprisingly more complex than one might initially think. For example, almost every sound other than actors’ voices in a movie is added after the filming is complete, and many sounds are created using objects that differ from those being portrayed. Sometimes even the actors’ voices are recorded in a studio and put back into the soundtrack of the film. These must be precisely synchronized with the original speech and actors’ lips.
Despite this complexity, sound plays such a strong role in conveying a sense of realism and futuristic technology that studios have included simple effects in sci-fi as part of sonic interfaces since the advent of talking pictures.
For example, when we hear the specific double beep with rising tone in Star Trek, we know the system is hailing someone. Similarly, when we hear the same tones reversed we know that the communication is over and the channel is now closed. If the same beep were used in each case, it would confuse us and the characters as to whether the channel was still opened or closed (Trek Core hosts most iconic sounds from Star Trek).
Lesson: Assign one system sound per system event
Users need to be able to differentiate system sounds to understand their meaning. Systems that use multiple sounds and sound sequences to communicate system messages will require some learning, but ultimately they communicate more information. In addition, the sounds need to be used consistently with specific actions in order to be associated with those actions.
The later Star Trek episodes used nearly twice as many different systems sounds, sequences, and voice responses as the first series. This differentiation could speak to the sophistication of the ship’s systems in specifying audio output with more precision, the care the production designers took and the increased sophistication of the tools available to them, and the audience’s increasingly sophisticated expectations and understanding of system sounds in the interface. Regardless, because many of our expectations are set or influenced by what we see in media, developers must consider more sophisticated sound solutions in their interfaces.
Ambient Sound
The ambient clacking of moving parts within a mechanical computer, like reels turning to access a section of tape memory, can be considered part of an interface because the clicks indicate that the system is working, even though this sound is mostly a by-product and not a designed signal.
In sci-fi, we find numerous examples of computer systems making such sounds, particularly to signal to audiences that they’re working to process a large set of data. When Scotty, the chief engineer of the Enterprise on Star Trek’s original series, remarks that he can tell that the ship’s engines aren’t tuned correctly because the hum they are producing is slightly off, he is calling attention to such sounds. But this doesn’t have to be an accidental by-product. With digital technologies, we can include this information deliberately.
Lesson: Convey ambient system state with ambient sounds
Different ambient sounds can unobtrusively inform a user that a system is operating and indicate its current state in broad strokes. Ambient sounds need to strike a balance between being the sonic focus and being too far in the background. If the sounds are completely unobtrusive, they aren’t useful. To be effective, they must not come to attention until it’s required, like when a system problem arises. This means the level of sounds must be calibrated, beforehand or dynamically, so that background sounds can come forward to a user’s attention.
Directional Sound
Humans naturally hear in three dimensions. Our ears are extremely sensitive, capable of discerning microsecond differences between the sound waves reaching each of our ears. Systems that produce sound directionally can enhance our understanding of where a sound source is in space, its direction, and its speed. Because our sense of directionality is fast and subconscious, it must be done precisely when replicated technologically, but the effects provide information to users that are immediately understandable and actionable. An apologetics example helps explain its power.
Figure 6.2 Star Wars Episode IV: A New Hope (1977).
When Luke and Han climb into their gunner stations aboard the Millennium Falcon, they strap themselves in, turn on the targeting computer, and put on headphones. As TIE fighters speed by, we hear the roaring approach of their engines, the piercing blasts of the laser cannon fire, the fading zoom as they speed away, and, if the stormtroopers are less lucky, the boom as their ship explodes (Figure 6.2). Few people pause to consider the physics of the situation, but where are these sounds coming from? After all, there is no air in space to convey sound waves between the exploding TIE fighter and the Falcon.
Of course we could excuse this as a convention of film, a way that the filmmakers engage the audience in the firefight. But if it helps the audience, wouldn’t these same sounds help the gunners, too? What if this wasn’t a filmmaker’s trick but a powerful feature of the weapon system itself? Let’s presume that the Falcon’s sensors are tracking each TIE fighter in space and producing the roars and zooms directionally to provide a layer of ambient data that helps the gunner track opponents, even when there are several targets or they are out of sight. This makes the sound effects a powerful sonic aspect of a mission-critical system.
Opportunity: Consider using spatial sound for nonspatial information
Hearing people don’t have to learn to locate sounds in space directionally. It’s a built-in capability. Designers can use this to place information, even when it’s not “naturally” spatial, in the space around the user. For example, if an interface for monitoring stock market portfolios used sounds to draw attention to trading activity that was likely to affect the portfolio, these signals could be made to seem closer than sounds used to indicate other activity. A user might need to be trained for the meanings behind arbitrarily assigned directions, but thereafter they would provide contextual clues to help inform more concrete tasks.
Directional sound is a little tricky to portray to more than one user in a space and works best when the sound delivery is spatially constrained, such as with headphones or in a small area occupied by one person. When more than one person inhabits a space, their individual capabilities, orientations, and locations need to be processed so accurate sound can be sent to each one separately, which may be prohibitively complicated for most multiuser systems.
Music Interfaces
We see only two interfaces in the survey that use music as a part of the interface as well as part of the content. The first example is Close Encounters of the Third Kind, in which a specific tonal sequence forms a welcome message—one of acknowledgment and understanding. The five tones are G, A, F, F (an octave lower), and C. This musical phrase is implanted telepathically into a few people who encounter smaller alien spacecraft as an invitation to visit the massive mother ship that arrives at the climax of the film. When the mother ship appears at Devil’s Tower, the US Army greets it with the same tones played on a specialized electronic organ (Figure 6.3).
Figure 6.3a–c Close Encounters of the Third Kind (1977).
It is a simple musical interface, with a user playing a standard synthesizer keyboard. As each note sounds, a corresponding colored light illuminates on a huge array. This is the visual part of the alien language, which is vital for complete communication.
The other example is an interface from Barbarella that uses music as a weapon. Like in Close Encounters of the Third Kind, the music is part content, part interface. Here, the evil scientist Durand-Durand straps Barbarella into a seat within his musical torture device called the Excessive Machine. Each note he plays on the keyboard simultaneously performs nefarious sexual acts on its victims, in an attempt to pleasure them to death. Though the exact cause and effect is demurely hidden from view, it is worth noting for the synergy of the playing, the music, and the intent (Figure 6.4).
Figure 6.4 Barbarella (1968).
Opportunity: Make music in the interface
What if interfaces could use music for indicating system status? It would not be without its challenges: encoding meaning into the music, handling users’ preferences for different styles of music, and processing data aesthetically so that its patterns are intelligent and not cacophonous. But once solved, it might be a way to receive system information—particularly ambient information—that is pleasant and not widely explored.
Sound and music are only the start, of course. Things get really interesting once you move to voice (and there are different types of voice interfaces). Finish reading this chapter—and this book—by purchasing it now from Rosenfeld Media (UXmas readers can save 15% on any title from Rosenfeld Media by using the code UXMAS).