Idylls on the Ides

Today is the Ides of March by the old Roman reckoning. It is, of course, most famous as the day of the year when Julius Caesar was assassinated, but long before that it was a day of special importance on the Roman calendar: the traditional start of the campaigning season, when the winter rains (and snows in high country) were over, and the ground was dry enough for Roman legionaries to march forth and hack Gauls, Etruscans, or Samnites to pieces. This was the Roman national sport before they conquered the whole of Italy and hired gladiators to do their hacking by proxy.

As the start of the season, it seems like a good day for this hack to report on recent doings. I have been fiddling about with various AI writing tools, some useless, some worse than useless, and some as silly as advertised. The fact is, large language models – LLMs – are not the ‘intelligence’ they are advertised to be. They can mimic human intelligence to the extent that they are trained from a corpus that includes the writings of humans who had something intelligent to say. When pressed beyond the bounds of their source data, or sometimes even when not pressed, they fall back upon bafflegab, vagueness, and a disturbing tendency to simply make things up.

I have found that the ‘AI’ programs with a chat-based interface are actually handiest for developing complex story scenarios, as they don’t try to make every scene self-contained, and I can choose to direct the story in promising ways as it goes along. For instance, I got one of these LLM tools to send my textual alter ego on a trip to the dangerous borderlands of a Viking kingdom. The program, obligingly serving up the distillation of decades of bilge-literature on that general subject, dropped hints indicating that I was, in fact, on an alternate Earth in the middle of the eleventh century. I struck up acquaintances with connections in the Varangian Guard, and made my way to Constantinople, where the real action was. I hope I may tell you a little of the situation it gave me, because it sheds interesting light on the strengths and weaknesses of these models.

In our world, at the time of my arrival, the Byzantine Empire was at its post-Justinian peak, ruling territories from Croatia to Syria, wealthier and more powerful than any other state west of China at that time. But it was in the early stages of being ruined by some of the most outrageous misgovernment in history, and the Turks were moving westwards out of Persia to take advantage of its growing weakness. Emperor Michael IV, a weak but well-meaning ruler, had just completed a highly successful campaign to subdue a rebellion in Bulgaria, and being a highly pious man, celebrated his victory by building numerous churches around the Empire. But he had always been troubled with epilepsy, and in his condition, weakened by frequent seizures, even minor wounds could be dangerous. He was, in the natural course of events, to die of gangrene in the legs within a few months.

It was at this point that the LLM dumped me in Constantinople, to work a ‘Connecticut Yankee’ on the situation.

I shall not bore you with details, except to say that I averted the Emperor’s untimely death, packed off his scheming cousin (in our history, the utterly worthless and incapable Michael V) to a monastery, and set about reversing the decay and bureaucratic drift that was already beginning to weaken Byzantium. The LLM made quite an entertaining pastime of this excursion into alternate history. However, it had particular weaknesses that strikingly revealed its limitations.

Some minor points: The model tended to glitch between present and past-tense narration, and sometimes changed persons as well, though it usually settled on second-person present, which was satisfactory for the purpose. It had a tendency to reuse the same names when introducing new characters, so I had to override it to supply a greater variety of names myself. Zoe, for instance, was quite a common name in eleventh-century Byzantium, but not nearly so common as the model made out; and since the Empress Zoe was a considerable character in the story, it was necessary to change the names of other Zoes that tended to crop up, to avoid confusion.

As the scenario diverged further from our own history, the problems grew more serious. The model kept cribbing from actual history, forgetting who the current Emperor was, and sometimes anachronistically introducing characters who would have been mere children at the time when the story was set. I had to correct it on numerous points; fortunately, it accepted the corrections, and was programmed graciously to apologize for its errors. (Not all such LLM clients are so well-behaved, as I found.)

The model tends to introduce additional complications and characters as a way of maintaining narrative tension, rather than adding elements to make the existing situation more challenging – which is the approach most human writers would wisely take. It created a tapestry of truly Byzantine intrigue, which is good, but kept losing track of the threads, which is very bad. Occasionally, when taking up a thread that had been neglected for a little while, I had to remind it who all the dramatis personae were and what they had last been up to: apparently it was not clever enough to use its own output as training text for subsequent scenes of the same story. This all made the story rather reminiscent of something by George R. R. Martin.

One rather pleasant strength of the model is its ability to recognize allusions and play along with them. One of the major political issues of the time was the growing antagonism between the Catholic and Orthodox churches, which would soon culminate in the Great Western Schism of 1054. The model was capable of talking quite intelligibly about the Filioque controversy, the claims of papal supremacy, and other relevant issues. I even paid a visit to Rome to treat with the scandalous Pope Benedict IX, who (in this alternate history) was a roisterer, a womanizer, and a cad, but nevertheless an intelligent man who could at least understand the implications of a split in the Church for his own position. (I had just helped the Normans move into Sicily against the Saracens, and arranged for a Norman baron to be crowned king by a Byzantine bishop. The Pope was terrified at the thought that such a king might march on Rome next.) In the course of these ecclesiastical affairs, I dropped various references to Latin and Greek authors, in the original languages when I could conveniently get them; the model had no trouble recognizing my sources, translating them, and playing along smoothly. This was great fun.

But the weaknesses remain paramount. The model simply does not grasp the structure of a plotted story – not even the try/fail cycle, or the triangle of ascending and descending action, or the three-act structure – none of the forms that hack writers use to produce low-grade fiction, let alone the subtler ways of plotting that underlie the best literature. It does grasp Raymond Chandler’s tongue-in-cheek advice to writers: ‘When in doubt have a man come through a door with a gun in his hand.’ Several times it ginned up a Turkish invasion, or a rebellion in the Balkans, or some such external and violent trouble to make things difficult for my protagonist. Of course, that also made things even more difficult for the model, which was swamped with more story details already than it could keep straight.

At times (and this goes straight to the heart of what is wrong with LLMs) the model simply lost sight of the difference between history and pulp fiction. An expedition to Egypt, to overawe the weakling Caliph al-Mustansir and bring him onto the Empire’s side, was derailed into something like a bad horror-movie scenario. It afflicted the Caliph with possession by a conveniently buried Egyptian god, a nasty customer who had to be exorcised by a very precise formula. (It is apparently vital, when dealing with Egyptian death-gods, to spurn them with the left foot. What a one-legged priest is supposed to do in such a case, the surviving documents do not say.) At another point, the model introduced a secret pulp villain with the hackneyed name of ‘The Shadow’ to tempt me into a wild-goose chase for an ancient ‘artefact’ that looked suspiciously like something out of a Dungeons & Dragons game.

The truth is, these models know nothing about reality; they only know words, and they don’t attach referents to those words. They only know, by strong probabilistic inference, that in a large body of training text, word A is frequently found in the vicinity of words B, C, and D, and it is good enough at analysing the grammar of phrases and clauses to cash in on the relevance of A to its neighbours. Naturally, they rely even more heavily than Homer on stock phrases and repeated epithets. I cannot tell you the number of Byzantine Greeks that the model described as having ‘neatly trimmed beards’. I could have made a lethal drinking game of it. A swig of ouzo for every neatly trimmed beard in the story would have made short work of my liver.

Some of the faults of the models could be remedied by the use of a sort of recursive story bible. The model begins with a scenario provided by a third party – one has to pay extra for access to the innards to write one’s own scenarios – which does, in effect, keep the story on the rails in the early going. By the time I reached Byzantium, I had left the rails far behind, and left the model to its own devices, which doubtless aggravated its sometimes strange behaviours. If it could be made to add a description of each new character, his name and position, as they appear; and if it could do something similar with recurrent locations and the dates of past events; then it would be able to organize a much larger story in a more plausible way, without relying on the user to niggle his way along by reminding it of disremembered details. This is something most human writers would do as a matter of course; but the LLM is concerned only with the surface text, and does not grasp the underlying structures of the story, or even perceive that there are any. All its work is done on the textual level, and none on the level of the elements of fiction as taught in creative-writing classes.

This leads me to a point that I have been brooding over for many years; and perhaps it is time that I wrote the monograph that I have long intended to write, but never saw a need for until now. The fact is that stories are not made up of words and sentences, the units that a large language model is equipped to deal with. They are made up of scenes, plot events, characters, and motivations, each of which can be expressed in words in many different ways. This is why a Greek myth can be translated into English, or a novel into a movie, and still remain ‘the same story’. The medium is fungible, so long as the structure is preserved; there are many ways of putting flesh on the same set of bones.

Literary critics have an unwholesome tendency to dissect the flesh in minute detail and ignore the bones; computer programmers try to make do without the bones, because they don’t have the tools to express them symbolically and make them amenable to analysis. In neither case do they touch the experience that a reader has in enjoying a story. I believe we need something like a formal model that at least addresses the existence of the bones, to break down and explain the ways and degrees in which a story can be altered and still remain ‘the same’. I should like to explore that in detail in coming essais, if my 3.6 Loyal Readers will bear with me.

Your thoughts and suggestions are much appreciated.

Comments

  1. Mary Catelli says

    Hmmm. Fairy tale variants get fun. They offer many chances to see the changes in action.

  2. Interesting investigation; thank you for sharing. Your storytelling AI sounds like it did better than I would’ve expected, but I’m not surprised to hear the limits it still has.

    And I’m very glad you’re still occasionally blogging!

  3. I tried to use an LLM to help me write weekly progress reports for my students, basically taking my quick jotted down notes in a word document and turning them into something professional.

    The results were…very interesting. I gave the LLM (in this case ChatGPT) my notes, an example document of what it should look like, and very specific instructions on what to do. However:

    – It only did the progress notes for three of my five students, and on some of them it combined multiple days into one until I specifically told it not to.

    – To get it to do all of my students I had to take their notes portion of the document I attached and laboriously copy and paste it into the AI. It was simply not reading it into the document.

    – I told it to add the same assignments to the beginning of the AM and PM sessions of each day. Instead of doing that for all students, it just repeated that assignment over and over again for one student. I had to add that bit manually.

    Did this save me time, in the end? Not sure. Maybe. But did it save me effort? Yes. Most of what I then had to do was correct formatting errors and make small corrections, as opposed to writing up the whole thing. So ultimately it was a net positive.

    I may try this again with a different LLM next time.

    • That’s the thing. Giving instructions to an LLM isn’t like computer programming, where you tell it to do $THING and it does $THING or dies trying. The LLM tries to interpret your instructions the same way it handled its training text – by associating words with other words and guessing what would be a cromulent response.

      It doesn’t help, when you want it to iterate a task a certain number of times, that LLMs are not really designed to notice when language is being used mathematically. I understand that one of the classic ways to trip up earlier versions of ChatGPT was to ask, ‘How many times does the letter R occur in the word ‘strawberry’?’ Because it tokenized words before interpreting them, it didn’t keep the literal input string in the buffer long enough to do operations on it after parsing… so it got the answer wrong.

      I have found LLMs most useful as a kind of verbal sparring partner, especially if I want to work up a character’s voice by practising rapid-fire dialogue. I don’t have to worry about the story going anywhere, I can throw away the results with a good conscience, and the machine’s responses can be illuminating. It’s a good way of getting over certain varieties of writer’s block.

      • Yeah. My thought was that for my particular task, it was a useful tool, but if I was dumb enough to just copy and paste the results without checking it I deserved what I got.

        It is quite good at designing simple lesson plans, though.

    • I use Whisper (usually classed as a machine learning model for speech recognition) for dictation transcription, which it’s pretty good at, and then claude.ai for cleanup (adding punctuation and that sort of thing), for which it’s…adequate? It seems to get less adequate the longer the transcription it’s dealing with. It’s helped somewhat for squeezing zero drafts in otherwise wasted time while commuting or stuck in meeting (where I can longhand and dictate to the computer later.)

  4. I enjoyed this; sounds like a fun, if somewhat harrowing, update on the classic choose your own adventure stories of my youth.

    As an example of to what extent the LLMs are limited by their training set, the one I mostly use usually manages to keep the characters and setting details straight when I’m using it to brainstorm on my “Pride and Prejudice in Outer Space” project, or do dictation cleanup on it. It gets befuddled more quickly when dealing with my other WIP, which is book three in a complicated steampunk setting with vague Tolkien and Ruritanian influences and somewhat more obvious Hammer/Roger Corman gothic influences.

Speak Your Mind

*