The Perceptual Layer: In the mind’s ear

Nowadays, we usually take in written language by reading it silently. This was not always the case. For thousands of years after the invention of writing, texts were normally decoded by reading aloud: a considerable help in puzzling out the meaning of a continuous scribble without punctuation, even without spaces between the words. Indeed, the earliest form of punctuation was a system of marks for use by the professional orators who recited texts aloud at public readings. The marks, distant ancestors of the comma, period, and semicolon, indicated where the reader should take a short pause for breath, a longer pause to indicate the end of a sentence, or a pause of intermediate length for clarity or emphasis. The earliest unambiguous mention of silent reading comes from the fifth century, when St. Augustine remarks in amazement on the prodigious skill of his contemporary, St. Ambrose:

When he read, his eyes scanned the page and his heart sought out the meaning, but his voice was silent and his tongue was still. Anyone could approach him freely and guests were not commonly announced, so that often, when we came to visit him, we found him reading like this in silence, for he never read aloud.

It was several hundred years later still that silent reading became normal in the Western world. To this day, when we read, we normally perform a live mental translation of the text into spoken language. Most people, when reading stories for pleasure, and especially when reading dialogue, put the text through this process of purely mental subvocalization; sometimes it even spills over into physical subvocalization, and the lips actually move. This happens above all with poetry, in which the sound and rhythm of the words are nearly always of cardinal importance. The most effective poetry can tempt us to read aloud, even if we have nobody to read to but ourselves, in the good old-fashioned way that every kind of text was read before St. Ambrose’s remarkable innovation. At the other extreme, some kinds of technical writing, such as mathematical equations or chemical formulae, cannot be translated adequately into spoken language, and we receive those in unrelieved silence.

If a story is told orally, of course, this process of translation does not occur. A somewhat analogous process, however, can happen when the storyteller has a foreign accent or uses an unfamiliar dialect. Then we have to listen closely, to translate the sounds coming out of his mouth into words that we actually know and understand; and this introduces a possibility of error. For instance, in some accents found in the southern U.S.A., there is a peculiarity that linguists call the ‘pen–pin merger’. The short e of pen is pronounced exactly like the short i of pin, so that it can be necessary to clarify which word is meant. In Arkansas, you will often hear people say ‘writing pen’, pronounced ‘writing pin’; this distinguishes a pen from a straight pin, safety pin, hatpin, or any other kind of pin that is likely to come up in conversation.

All this business of translation and error-correction takes place on the second layer of our model, the Perceptual layer. The actual letters on the page, or the actual sound-waves coming out of the speaker’s mouth, are converted into words that we can understand; and usually the process is so smooth and unconscious that we never notice it happening. When errors do occur, the communication crashes to a halt, and all the layers above the Perceptual are temporarily halted in their operation. Our attention is snatched away from the story to deal with the emergency. ‘What was Jimmy using a pin for? Was he pricking holes in the paper? Oh, you mean a pen! Why didn’t you say so?’ Then we have to get our bearings again before the story can continue.

This is why modern how-to-write books so often caution against using dialect in fiction. What they really mean (but do not say, because the people who write such books know surprisingly little about language) is that you have to be sparing in using phonetic spelling to indicate dialect. A little goes a long way. If we tried to accurately spell the sounds made by our Arkansan friend when he says ‘writing pen’, they might come out something like RAH-din pee-yin. This is so far from the familiar shape of the written words that it might as well be a foreign language; but when we hear him speak, we do not get that impression at all. If we want the Perceptual layer to do its job quickly and economically, but we need the reader to ‘hear’ the dialect with his mind’s ear, we might write the phrase as writing pin, as I did above. That gives the Perceptual layer enough cues to work on, but not enough to stall the translation process so that it draws attention to itself. The kind of thing Mark Twain used to do in Huckleberry Finn is no longer recommended:

‘What makes me feel so bad dis time ’uz bekase I hear sumpn over yonder on de bank like a whack, er a slam, while ago, en it mine me er de time I treat my little ’Lizabeth so ornery. She warn’t on’y ’bout fo’ year ole, en she tuck de sk’yarlet fever, en had a powful rough spell; but she got well, en one day she was a-stannin’ aroun’, en I says to her, I says:

‘“Shet de do’.”’

Twain was well advised to represent the sounds of dialect in this painfully faithful way, because there was no sound-recording technology in the antebellum South, and no broadcasting, and many of his readers had never heard anything much like the Southern Negro dialect that Jim is using in this passage. Nowadays, everyone is familiar with a wide variety of dialects, whether through electronic media or by the personal experience made possible by cheap travel. A few hints are enough to convey the effect. A modern writer might tone the passage down to something like this:

‘What makes me feel so bad this time is because I hear sumpn over yonder on the bank like a whack, or a slam, while ago, and it mind me o’ the time I treat my little ’Lizabeth so ornery. She warn’t only ’bout fo’ year old, and she took the scarlet fever, and had a powerful rough spell; but she got well, and one day she was a-standin’ around, and I says to her, I says:

‘“Shut the do’.’”

The grammar of Jim’s dialect is preserved in this version, and just enough hints of the nonstandard pronunciation that the reader’s practised ear for regional accents can fill in the rest. Some readers, especially those who are not Americans, have real trouble deciphering the dialect in Huckleberry Finn, because the phonetic spelling is so very far from the default pronunciation that they are personally used to getting from the Perceptual layer.

An additional difficulty arises with postmodern writers and readers. An excessive concern for cultural and racial sensitivity has led recent critics to insist on two ways of treating dialect, which are radically incompatible and cannot both be done at the same time. One school holds that it is racist to mention dialect at all, so that Jim (for instance) should be represented as talking perfectly ordinary American English, the then equivalent of ‘Broadcast Standard’. The other school counters that the very idea of ‘standard’ English is itself racist, and the entire story (narrative as well as dialogue) should be cast in the most faithful representation of dialect possible. This method was used, for instance, in an unauthorized retelling of Gone With the Wind, in which even the title was rendered into African-American Vernacular English: The Wind Done Gone.

Some readers (I am not one of these) find audiobooks easier to ‘get into’, because the Perceptual layer has had much of its work done for it by the narrator. A good voice actor can greatly smooth the path of a story into the reader’s brain, not only by eliminating the step of mental translation from writing into speech, but by tying down the factors of intonation and pacing to a particular physical implementation. There is no question of reading an audiobook quickly or slowly; no possibility of skimming the text or missing the intended meaning of the punctuation.

There is another potential advantage. A reader in a hurry, especially when losing interest in a book, is always in danger of falling into a state of MEGO – ‘my eyes glaze over’. The brain wants to skip past the boring bit, and even if the reader resists, manfully struggling with the text in sequential order, the temptation is there to resort to ‘speed reading’ and other shortcuts. But excessively fast reading is no friend to comprehension, and a positive enemy to the kind of deep, trance-like participation that a good story invites in its audience. The speed reader flits from point to point on the surface of the text, looking for bits of information that will let him skate by with minimal effort – as if he were reading an airport timetable. He does not give himself the leisure to enter imaginatively into the scenes of the story, let alone the psychology of the leading characters. The top three layers of our model, where the deepest engagement with stories occurs, are closed off by this method of reading; and even at the lower levels, the details are blurred and lost. For a habitual speed reader, audiobooks may remove the temptation to scan the story too fast to enjoy it.

Speed reading is dangerous to prose fiction, but fatal to poetry. We can roughly define an effective poem (which may be, but is not necessarily a good poem) as one that really uses the poetic devices of metre and cadence; it is not just prose arbitrarily broken into separate lines. In an effective poem, the sound of the words is of paramount importance, and shares largely in creating the emotional effect. Rhyme, metre, alliteration, assonance, parallelism, and chiasmus, to name a few of the tricks of the poet’s trade, work chiefly on the Perceptual layer: they help create a music of words that means more than the mere words themselves. This is one instance in which the Perceptual layer is justified in calling some attention to itself, because it is delivering a heightened emotional payload that would otherwise be filtered out by the linguistic processes of the Syntactic and Semantic layers above. Rhyme, metre, and the rest play the same kind of role in a poem as incidental music in a film – about which I shall have more to say shortly.

If we stop to think about these considerations, we can perceive some more principles of the layered model. Each layer has to have a quality that we might call transparency. If a book is badly printed, so that the text is partly illegible, it takes an inordinate effort just to decipher the words: the Formal layer, so to speak, is opaque, and the Perceptual layer does not get the necessary information to do its own job. If we skim over the text, or if the dialect is too thick for easy comprehension – if, for that matter, we are just poor readers, or reading in a second language in which we are not fully fluent – then the Perceptual layer becomes opaque, and does not let sufficient information through to the layers above. Each layer of the model is totally dependent on the proper functioning of the layers beneath it; just as the roof of a house depends on the walls, and the walls depend on the foundation. If the lower layers don’t do their job, the layers above them cannot stand.

So far, we have been considering the role of the Perceptual layer in written and oral stories. Other media give this layer additional jobs to do. Comics or picture books, for instance, rely on the reader’s eyes and brain to extract information from visual images, and use it properly in the context of the story. Many learned books have been written on the sciences of perspective, colour perception, and visual composition; we need not recapitulate them here. Dramatic performances – stage plays, films, and video – add greatly to the complexity of the Perceptual layer, and we must stop to analyse some of that complexity. Let us consider what happens when a story is translated into a movie.

First of all, the descriptions of characters and scenery in a written story have to be converted to sounds and visual images. The how-to-write books are always exhorting writers to use all five senses in their stories. Fortunately, this advice is seldom followed, because at bottom it is nonsense. On the one hand, we humans have many more than the traditional five senses; the sense of balance, for instance, as well as kinaesthesia, the sense of motion, and proprioception, the sense of our own body parts. On the other hand, the filmmaker has to rely on sight and hearing exclusively; but this turns out not to matter much, because we get most of our information about the world from our eyes and ears. Our sense of smell is relatively weak, touch does not operate at a distance, and taste works only on things we actually put into our mouths. The writer who describes things mostly in terms of sight and sound may be breaking the rules of the how-to-write books, but he is not necessarily writing less effective stories because of it.

Now, sounds and images have a direct effect on the nervous system and on the emotions. Our brains are wired to respond immediately and viscerally to the things we see and hear; much less so to language, which, after all, is a relatively recent invention in the history of life on earth. A written or oral story builds up pictures and sounds by describing them with words; the pictures and sounds themselves are constructed only in the audience’s imagination, and this, as we shall see, happens on the Diegetic layer, after all the processing of language has been completed. A film or a play can bypass the imagination completely, and deliver pictures and sounds directly to the brain. This gives the director additional toys to play with, and makes up, in part, for the emotional distance imposed by the medium of drama.

What I mean by ‘emotional distance’ is this. In a written story, especially, the reader can gain direct access to the thoughts and emotions of the characters, which cannot be seen or heard, but only experienced from within. Drama views characters entirely from the outside; we only hear the thoughts that they speak aloud, and only share in the emotions that they reveal by their actions, or by tone of voice. Filmmakers, especially, can get round some of these limitations by inserting extra sense-impressions into the Perceptual layer. There is the clumsy device of having the actor speak the character’s thoughts in a voice-over, but this has become such a laughable cliché that it is hardly used anymore except for comic effect. Other techniques remain more powerful and more acceptable.

The oldest trick of this kind, as old as drama itself, is the use of incidental music. The very first Greek dramas already had a chorus to give a running commentary on the deeds of the main characters; and the chorus could not only recite lines, it could sing them, using music to heighten the emotional effect. Much later on, opera was invented essentially as a way of making the dramatis personae express their emotion through music; some very famous operas could almost be described as an extended wallow in the characters’ feelings.

Along with music, drama can heighten the effectiveness of a scene with sound-effects. Conventionally, the effects are referred to as diegetic sounds – that is, they are understood to be occurring inside the story, and the characters on the stage can hear them; whereas incidental music is extra-diegetic, a kind of outside commentary that is heard only by the audience.

This rule is not as hard and fast as it is generally made out. In the Star Wars films, for instance, we hear the roar of spaceship engines and the swooping and swooshing noises of fighters manoeuvring in space. Of course, sound does not travel in a vacuum, as George Lucas knew perfectly well. When he was filming the original Star Wars in the 1970s, he did test screenings of space battles in which the action was realistically silent. The scenes bombed. It turned out that the roars and swooshes, as much as the John Williams orchestral score, are necessary to get the audience emotionally involved in the battles. They awaken the kinaesthetic sense, make the viewer feel the motion of the ships, the thrust and centrifugal force; they make the bad guys’ ships seem more menacing, the good guys’ aerobatic manoeuvres seem more skilful and heroic. In effect, these are extra-diegetic noises; they are part of the soundtrack, along with the music, and can only be understood as such.

Visual effects, too, operate at the Perceptual layer, and in a very characteristic way. They are not what they appear to be. The physical objects placed in front of the camera do not look to the cast and crew as they will (after post-production) to the audience. Gollum in the Lord of the Rings films was a monstrous creature, definitely inhuman in appearance, but very realistically animated using the best CGI then available. But on the set, he was Andy Serkis in a motion-capture suit, looking nothing like the Gollum of the finished movie. Green screens, CGI, matte work, double exposures, split exposures – all the techniques of ‘trick’ filmmaking, new and old, are designed to trick the viewer’s brain into seeing something at the Perceptual layer that the Formal layer could not normally deliver.

Mel Brooks, as every film-buff knows, is a brilliant comedy director, and invented several techniques that have become part of every comedy filmmaker’s stock in trade. One of his most characteristic devices is a kind of audiovisual pun, in which he presents elements that would be extra-diegetic in a normal film, and suddenly reveals that they are actually interior to the story. This technique is sometimes confused with the similar device called ‘breaking the fourth wall’, but it is actually quite different, and worth studying.

He makes the best use of this device in Blazing Saddles. There is, for instance, a scene where we see Bart, the newly minted black sheriff, riding across the sagebrush, looking more like a New York fashion model than an Old West lawman – complete with a saddle labelled Gucci. In keeping with his decidedly un-Western looks, the soundtrack is playing a very un-Western piece of music: ‘April in Paris’, performed by Count Basie’s Orchestra. The camera angle changes, and suddenly we see Count Basie himself, with his orchestra behind him, playing the incidental music in the middle of the desert. The soundtrack is actually a part of the story, and Bart pauses to ‘lay five’ on the Count before riding on. The device works because we have already perceived the incongruity between the scene and the stereotypical Western, and were led to believe that was the joke. The actual appearance of Count Basie comes as a surprise, because we thought Brooks was leading us up to a totally different punchline; and so we laugh all the louder.

Later on in the film, there is a magnificently overdone fight scene that spills out of the ‘Western frontier town’ set and rampages all over the Warner Brothers lot, sucking the casts and crews of other movies helplessly into the fray. At first, this looks like a straightforward case of ‘breaking the fourth wall’; but Mel Brooks is not a straightforward director. The fight was instigated by Harvey Korman’s character, the villainous Hedley Lamarr, in a speech that cued the audience to expect a certain kind of joke: ‘You will be risking your lives, whilst I will be risking an almost-certain Academy Award nomination for the Best Supporting Actor.’ But this is not the joke that Brooks actually delivers. After a sequence of increasingly silly and Pythonesque scenes, in which the brawl invades one section of the Warner studios after another and finally breaks out into the streets of Los Angeles, Korman takes a taxi to the cinema where Blazing Saddles itself is having its premiere. Popcorn in hand, he takes his seat, and watches the dénouement of the story on the screen.

Hollywood had previously used the device of showing the filmmakers at work in an extra-diegetic sequence, intercut with the diegetic scenes inside the film; but here both the inside and outside views are diegetic! The ending of the movie attains the kind of fractured double perspective that we might see in an M. C. Escher print. The silliness is successfully reined in, and the inside and outside views of the story are unified and justified in an artistically satisfying conclusion.

All the raw materials for this extended conceptual prank are carefully delivered at the Perceptual layer, and since they are delivered in the form of images and sounds rather than language and dialogue, they ‘tunnel’ right through the layers to deposit their explosively funny payload square in the Immersive layer. This is the kind of express delivery that film can do and written fiction cannot; though, to be fair, it is virtually impossible nowadays to produce the precise comic effect of Blazing Saddles, because the audience is wise to this kind of double vision, and is liable to anticipate the climax and spoil the joke.

This ‘tunnelling’ is not usual in the storytelling model, but it does occur and we need to be mindful of it. Conceptually, it is similar to the ‘tunnelling protocols’ one sees in the OSI model. Tunnelling on a computer network violates the normal hierarchy of layers, by constructing its own data packets at a high layer, encrypting or otherwise altering them, and then wrapping them up a second time in the normal packet structure to be handed off to the regular network. It is used for things like setting up virtual private networks, or for sneaking contraband data through a firewall by disguising it as something else. The kind of tunnelling we see with incidental music, or with Mel Brooks’s characteristic gags, actually skips over the middle layers, because it bypasses the use of language. It plays a similar role to tunnelling on a network, even though its internal functioning is quite different. In both cases, information is delivered ‘raw’ to a distant point in the model, by encoding it in a form that cannot be ‘cooked’ by the normal processes occurring on the levels in between.

Once the Perceptual layer has done its work, the visual or auditory cortex hands the text, or images, over to other parts of the brain. Let us follow it there.

(To be continued.)

Comments

Mary Catelli says

4 May 2025 at 16:58

Some media work better for some things that for others.

For instance, if the heroine is to be fairest of them all, you can be told that in prose. It takes rhetorical skill to be convincing, but in a visual medium, you are stuck with actual looks.

Then, again, I once read both the light novel and the manga for a work. The manga first. The isekai’ed hero once went to meet a friend he had made in the new world, and found him murdered. We have the stunned reaction as he kneels by the body, and can imagine his feelings. In the light novel, his thought were all what to do next, not at all reaction to the death of his friend. Some media work better with some skill sets.

- Tom Simon says
  
  4 May 2025 at 18:06
  
  This is very true; and it is why we generally come out of an adapted work feeling that the adaptation is inferior to the original, but occasionally, that the adaptation captured the spirit that the original didn’t quite evoke. The more appropriate medium gets the more powerful response that we equate with authenticity.
  
  - Wendy S. Delmater says
    
    9 May 2025 at 10:19
    
    A perfect example of the movie being better than the book for just such reason as you suggest? Minority Report by Phillip K. Dick. the move was much better.
    
- Stephen J. says
  
  6 May 2025 at 14:54
  
  “It takes rhetorical skill to be convincing, but in a visual medium, you are stuck with actual looks.”
  
  Wolfgang Petersen originally didn’t want to have Helen of Troy appear in his sword-and-sandals epic TROY (2004) at all, feeling that there was no way any actual human actress could live up to the legend.
  
Stephen J. says

5 May 2025 at 22:13

“It turned out that the roars and swooshes, as much as the John Williams orchestral score, are necessary to get the audience emotionally involved in the battles.”

One “explanation” I saw posited for this, in the realm of STAR WARS “fanon”, actually struck me as quite brilliant: All the noises of starship combat (except for those naturally produced by the ship itself, like the roar of its own engines) are actually simulated by each ship’s own flight computer, deliberately exploiting the organic pilots’ auditory sense to convey key information — e.g. rather than doing something like flashing a red light on a flat screen and beeping at a faster and faster pace, the computer translates the data about an approaching enemy vessel into an audible noise simulating an engine roar, whose pitch and quality indicates the type of ship, whose volume indicates proximity (increasing as the foe gets nearer), and whose origin point within the control cockpit indicates relative position (the sound comes from speakers behind the pilot to indicate a rear-approaching ship, from his left or right side to indicate a flanking one, etc.) Everything the audience hears during a spaceship fight is actually what the pilots are hearing — what is first perceived as diegetic, and then accepted as extra-diegetic for dramatic purposes, is pulled back into the diegetic layer by a completely extra-diegetic explanation provided by the fans and viewers. Which is one of my favourite examples of what you once called “legosity”.

- Tom Simon says
  
  6 May 2025 at 10:17
  
  That’s extremely clever!

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

The Perceptual Layer: In the mind’s ear

Comments

Speak Your Mind Cancel reply

Newsletter

Tip Jar

Books by Tom Simon

Fiction

Essay collections

Archives

Superversive Fiction

Blogs for writers

Other blogs I read

Recent Posts

Recent Comments

Meta

The Perceptual Layer: In the mind’s ear

Comments

Speak Your Mind Cancel reply

Newsletter

Tip Jar

Books by Tom Simon

Fiction

Essay collections

Archives

Thought clusters

Superversive Fiction

Blogs for writers

Other blogs I read

Recent Posts

Recent Comments

Meta