Samuel Beckett
Digital Manuscript Project

A Practical Guide to Digital Genetic Editing

Dirk Van Hulle, Vincent Neyt, Wout Dillen

1. Gathering and Classifying Documents

To explore the practical aspects of genetic criticism, the genesis of Samuel Beckett’s play Krapp’s Last Tape (1958) will serve as an example. The play is a portrait of an artist as an old man, who has the habit of spending his birthday recording a tape about the past year and of listening to older tapes. The text of the play and the documents pertaining to its textual genesis can be accessed in the Beckett Digital Manuscript Project at

The first step is to find as many traces of the genesis as possible, usually in university archives and holding libraries. For research on modern manuscripts in the English language, the relevant manuscripts will often be held in archives on both sides of the Atlantic. The traditional way of proceeding is to select all the manuscripts that belong to the genesis of one particular work. And for the practical purposes of this tutorial, it is indeed useful to limit the example to one work. In general terms, however, it should be noted that delimiting the material pertaining to one single work is not always easy, for after finishing one work and starting on another one, an author may try out various dead-end drafts that did not make it into any published text, at least at first sight. But often these culs-de-sac did have a function. They were necessary for the author, even if only to realize that this was not the way to proceed. As a successful thriller author, Dan Brown advises aspiring writers: ‘If you’re not sure what to write, write the wrong thing a few times, and let that be the process by which you find the right thing.’ [1] This has consequences for the genetic dossier, especially for the question: where does a work’s genesis begin? For instance, in the case of Kazuo Ishiguro’s The Remains of the Day, the actual drafts of the novel strictly speaking were preceded by drafts for several other writing projects that somehow contributed to its genesis. Instead of focusing on only one work, it is therefore often useful to zoom out and consider the genesis as part of an entire oeuvre in progress.

To classify the documents and create some order in the – sometimes chaotic – material, it is useful to make an initial distinction between ‘endogenesis’ and ‘exogenesis’. The terms endogenesis (the sketches, drafts, typescripts, proofs, leading up to the first publication) and exogenesis (the external source texts the author made use of to write the text) were coined by Raymonde Debray Genette in 1977. [2] Pierre-Marc de Biasi later reintroduced these terms in his typology of genetic documentation, admitting that this division is evidently somewhat artificial, since every ‘exogenetic’ note already becomes ‘endogenetic’ the moment it is written down (Biasi 1996, 45-46).

Within genetic criticism, there has been a bias towards what precedes the so-called ‘bon à tirer’ (‘pass for press’) moment. The genesis of a literary work, however, often continues after publication. Because the transformations of the text after this moment take place in a public environment, they do not correspond to the same logic of the writing process as the ‘avant-texte’, according to Pierre-Marc de Biasi (1998, 43). For a long time, this other logic has shaped the dominant model of genetic criticism, most explicitly visualized in Pierre-Marc de Biasi’s typology of genetic documentation (Biasi 1996). While Debray Genette’s dichotomy framed the creative process in terms of an inside and an outside, de Biasi’s model frames it in terms of a private and a public environment. The so-called different logic of the ‘publication phase’ according to de Biasi’s typology seems to have implied that the continuation of the genesis after the ‘bon à tirer’ moment did not belong to the realm of genetic criticism but rather to that of textual criticism and scholarly editing. But if writers translate their own writings (as in the case of an author such as Samuel Beckett), or if the criticism of their work consistently leads to revised editions (as in the case of Charles Darwin), one could argue that the genesis does continue after the ‘bon à tirer’ moment.

To Debray Genette’s terms ‘exogenesis’ and ‘endogenesis’, the term ‘epigenesis’ was therefore added to denote the continuation of the genesis after publication (Van Hulle 2007). This type of genesis is not part of what Raymonde Debray Genette called ‘endogenesis’; it is no longer part of the ‘inside’ (endo-) of the genesis, but follows ‘after’ it. [3] The term ‘epigenesis’ takes stock of the apparent polarity between text and the so-called ‘avant-texte’ but also of other subtle dynamics between ‘exo-’, ‘endo-’ and ‘epigenesis’ in the transition zone, suggesting a triangular model. [4] Like any model, this is a conscious reduction or simplification of the complexities marking creative process. It functions like James Clerk Maxwell’s colour triangle, based on the combination of the three primary colours at its corners. The rationale behind the triangular model is to present the genesis not so much as a linear development ‘before’ and ‘after’ publication, but as a spectrum of genetic actions, an interconnected structure that makes explicit that at any point there is the possibility that exogenetic material colours the endo- or the epigenesis. It invites readers to view variants not so much as textual differences between editions, deviating from the ne varietur or the ‘definitive’ text (a term that is no longer in use in scholarly editing), but as elements in a long, complex genetic process, linking epi- with endo- and exogenesis.

To explore the use of endo-, epi- and exogenesis in classifying the material, the documents pertaining to Beckett’s Krapp’s Last Tape will serve as a sample corpus. It is not necessary to have read or watched this play to understand its genesis, but just a few words on its content may be useful to have a sense of the material. Krapp is a 69-year-old old man. It is his birthday. Every year, on his birthday, he records a tape with impressions of the year that has just passed. This year, before starts recording, he listens to a tape he made thirty years earlier, the year he decided to break up with his love (the so-called ‘farewell to love’ scene in a punt) to devote his life instead to writing his ‘magnum opus’ after having had a ‘vision’. The way he reacts to this decision thirty years later suggests he thoroughly regrets it. He keeps rewinding the tape to the ‘farewell to love’ scene, listening to it until the tape runs on in silence.

1.1. Endogenesis

The beginning of a work’s genesis is not always easy to determine. The case study, Krapp’s Last Tape, is relatively straightforward when it comes to delineating the start of the endogenesis. Even though it will be necessary to nuance the seemingly clear-cut nature of this start because of concurrent writing projects, it is safe to say that the first draft of this play is a handwritten text called ‘Magee monologue’, written for the actor Patrick Magee in a document referred to as the ‘Eté 56’ Notebook (because Beckett started writing in it in the summer of 1956).


1.    Manuscripts: There is only one (fully) handwritten document pertaining to the genesis of Krapp’s Last Tape, preserved at the University of Reading, call number UoR MS 1227-7-7-1, also known as the ‘Eté 56’ Notebook

2.    Typescripts and copies of typescripts: The typescripts that have been preserved are kept at the Harry Ransom Center in Texas (HRC), at the University of Reading (UoR), and at the University of California, San Diego (UCSD):

HRC SB 4-2-1 [5] : typescript with autograph corrections

HRC SB 4-2-2: typescript with autograph corrections

HRC SB 4-2-3: typescript with autograph corrections

HRC SB 4-2-4: typescript with autograph corrections

HRC SB 4-2-5: thermofax copy of (the carbon copy of) a missing typescript

UoR MS 1659: typescript with autograph corrections

UCSD AS 103-74-10: typed copy with autograph corrections

The last item in this list is interesting because it was found among the papers of Alan Schneider, who directed the first US performance of the play at the Provincetown Playhouse, New York, on 14 January 1960 (with Donald Davis as Krapp). It is quite possible, even probable, that the typescript was not typed by Beckett. [6] The typescript has a place in the endogenesis (the succession of drafts before publication), but in the margins it also shows some marks (such as the letters A and B) that probably refer to different cameras, for a TV version of the play (see below, ‘TV Krapp’). In that respect, the typescript also belongs to the epigenesis.

1.2. Epigenesis

Pre-Book Publications

The play was published in Evergreen Review II.5 (Summer 1958, pp. 13-24). In the Alan Schneider papers at UCSD, there is a copy (dedicated to Schneider by the actor Donald Davis) with annotations by Schneider, but also with one annotation in Samuel Beckett’s hand. On page 14, next to the scene in which Krapp fetches the ledger, Beckett wrote:

If necessary


(He looks at cover / of ledger, reads.) / Krapp year by year. / (The Krapp annual.) / (He opens ledger, / bends over it, / etc.[)] (BDMP 3, p. 123)

This note shows how the genesis continues after publication. In this case, it shows Beckett’s development as a playwright, more specifically his increasing awareness of the audience and of the need to think not just ‘textually’ but also in terms of what works on stage.


KLT 1959        Krapp’s Last Tape and Embers (London: Faber and Faber, 1959).

KLT 1960        Krapp’s Last Tape and Other Dramatic Pieces by Samuel Beckett. First Evergreen Edition (New York: Grove Press, 1960)

KLT 1964        Krapp’s Last Tape, in: Dramatische Dichtungen in drei Sprachen vol. 2 (Frankfurt am Main: Suhrkamp Verlag, 1964).

KLT 1970        Krapp’s Last Tape and Other Dramatic Pieces, The Collected Works of Samuel Beckett (New York: Grove Press, 1970).

KLT 1974        Krapp’s Last Tape, in: Das letzte Band. Krapp’s Last Tape. La Dernière Bande. Frankfurt am Main: Suhrkamp Verlag, 1974.


Beckett was a bilingual author. He sometimes wrote in English, sometimes in French, and translated most of his works into the other language – increasingly systematically towards the end of his career. In the case of Krapp’s Last Tape, however, a first version of the translation was made by Pierre Leyris. It was sent to Beckett, who was not too satisfied with it and they discussed it. By 11 October 1958, Beckett had finished the revision of Leyris’s translation (Pilling 2006, 142). They were both mentioned as translators in the first edition. The revised version was typed up and a carbon copy (showing only few corrections and additions in pencil, red and blue inks) is held at the HRC (HRC SB 4-2-6). The carbon copy does not indicate the name of the translator(s) but the Suhrkamp archive at the deutsches Literaturarchiv (DLA) in Marbach contains an early photocopy of this typescript, which does have a title page with the name of Pierre Leyris as the sole translator. This photocopy was sent to Suhrkamp by the German translator Elmar Tophoven (22 December 1958).

A pre-book publication appeared in the magazine Les Lettres Nouvelles, and the title verso of the first edition indicates that it came out in 1959. A limited edition of 40 copies on special paper (‘pur fil Marais’) and 7 non-commercial copies (‘hors commerce’) were printed in December 1959 (BDMP 3, 85). The date of printing (‘achevé d’imprimer’) of the first commercial edition is actually 5 January 1960, but since the limited edition was already printed in December 1959 and copyright was claimed before the end of the year, all the later editions indicate ‘1959’ as the year of publication. 

LLN                  La Dernière Bande in Les Lettres Nouvelles, ed. by Maurice Nadeau, 7° année, Nouvelle Série, N° 1 (4 March 1959): 5-13.

LDB 1959       La Dernière Bande, suivi de Cendres (Paris: Les Éditions de Minuit, 1959).

LDB 1960       Lettre Morte (Robert Pinget) et La Dernière Bande (Samuel Beckett). Collection du Répertoire (Paris: Théâtre National Populaire / Les Éditions de Minuit, 1960).

One of the reasons for choosing a play as the case study for this discussion is that its epigenesis is often more diverse and rich than that of a poem or a piece of prose, as the following documents show. Especially when the playwright is involved in the rehearsals or even the direction of their play, this often results in numerous epigenetic changes to the play.

Acting Copies

A set of acting copies for the 1973 London Royal Court Theatre production directed by Anthony Page is preserved at the Harry Ransom Center, Texas (HRC SB 5-4).

Production notes

‘Schneider TS’: UCSD preserves a carbon copy of a typescript of Krapp’s Last Tape, annotated by Alan Schneider. Although the annotations are not in Beckett’s hand, they do sometimes contain first-hand information when they are combined with the correspondence between the author and the director. For instance, ‘These old P.M.s’ in the typescript are marked with a note in the top margin: ‘post mortems’. Schneider had asked Beckett whether ‘post mortems’ was what ‘P.M.s’ stood for, which Beckett confirmed in a letter of 4 January 1960 (NABS 59).

‘Schneider TS2’: A second set of typescripts was made for Schneider’s production to facilitate the recording of the voice on the tape, in three parts.

‘Lilly ms’: For the 1960 first French production (Théâtre Récamier, Paris, 2 March 1960), directed by Roger Blin, Beckett annotated a copy of the first Minuit edition (1959). This annotated copy is preserved at the Lilly Library in Bloomington, Indiana. (see BDMP3, 234ff.)

‘Schiller nb.’: In 1969, Beckett directed his own play for the first time in the workshop of the Schiller-Theater in Berlin, with Martin Held in the role of Krapp (premiere: 5 October 1969). Beckett’s production notes are written in a small notebook (22 x 13 cm), published by Faber and Faber (TN3).

‘Martin a.c.’: The actor Jean Martin annotated his French script for Beckett’s1970 Théâtre Récamier production.

‘Faber a.c.’: Beckett annotated a copy of the 1970 Faber and Faber edition for the 1973 Royal Court production with Albert Finney as Krapp, directed by Anthony Page (16 January 1973; UoR MS 1227-7-10-1). For instance, on page 11, Beckett noted in the margin: ‘Action interrupted by Hain 1’

‘Grove a.c.’: For the same production, Beckett also annotated a copy of the 1960 Grove Press edition of Krapp’s Last Tape and Other Dramatic Pieces. For instance, on page 13, Beckett marked in the margin: ‘Action interrupted by first look over his shoulder left into darkness backstage’.

‘Suhrkamp 3’: A copy of the trilingual edition Das letzte Band. Krapp’s Last Tape. La Dernière Bande (Frankfurt am Main: Suhrkamp, 1974) contains revisions of the English text (TN3 xxxiii).

‘San Quentin notes’: For a production of Krapp’s Last Tape by the San Quentin Theatre Company, Beckett prepared three leaves of notes (UoR MS 2101).

‘Haymarket Theatre notes’: Beckett make four sheets of notes for a production at the Haymarket Theatre in Leicester in 1989 (UoR MS 3507).


TV Krapp’: On 31 January 1969, Beckett sent some ‘Krapp notes’ to his American publisher, Barney Rosset (at Grove Press) for a possible adaptation for television. The main idea for this adaptation is to work with two cameras, camera A being ‘mere eye’, while B just ‘listens’: ‘its activity is affected by words spoken’. The degree to which it is affected is marked by means of three ‘levels of intentness’: ‘Low’ (for all references to health and work), ‘Intermediate’, and ‘High’ (all references to Krapp’s old loves). The way these ‘High’ levels of intentness should be marked is by freezing the frame. UCSD also holds a complete shooting log of the production on 2 June 1971. In the meantime, Beckett had been approached by the WDR (Westdeutscher Rundfunk, Cologne) about a television version of his production of the play.

McWhinnie a.c.’: Copy of the 1960 Grove Press edition annotated by Beckett for Donald Whinnie’s 1972 BBC Television production, broadcast on 29 November 1972, with Pat Magee as Krapp. The annotated copy of the BBC script is kept at the University of Reading (UoR MS 3071; 30 leaves).

Opera adaptation : Marcel Mihalovici’s manuscript notebook, preserved at the University of Reading (UoR MS 1227-7-10-2) contains the French (blue), German (pencil) and English (red) versions of the opera Krapp, ou La Dernière Bande.

1.3. Exogenesis

The fact that the terms ‘exogenesis’ and ‘endogenesis’ were introduced as a pair may suggest that exogenesis only refers to source texts that were consulted or used by the author during the endogenesis. But external elements keep playing a role in the genesis even after publication, as a few examples from Krapp’s Last Tape will illustrate.

Exogenesis - Endogenesis

Take for instance the obsolete term ‘wearish’, the first word of the first draft. Krapp is described as a ‘wearish small old man’. Although, in and of itself, this seems an inconspicuous stage direction, the unusual word draws attention to itself and thus invites further investigation. The same phrase recurs in Beckett’s early works. The archaic adjective ‘wearish’ occurs a few times in the story ‘Echo’s Bones’, [7] and in the poem ‘Enueg I’ the ‘wearish old man’ is linked to Democritus. [8] As John Pilling points out, this passage is based on a note in Beckett’s ‘Dream’ Notebook, derived from Robert Burton’s The Anatomy of Melancholy. [9]

Another remarkable word in the play is ‘chrysolite’, when Krapp describes the eyes of a ‘dark young beauty’ he saw when he used to sit on the ‘bench by the weir’: ‘The face she had! The eyes! Like...(hesitates)...chrysolite!’ (1958, 19) The word is an allusion to Shakespeare’s Othello:

If heaven would make me such another world
Of one entire and perfect chrysolite
I’d not have sold her for it. (qtd. in Van Hulle 2015, 121; 210)

Beckett wrote these three lines in the top margin of his annotated copy of the Faber edition, probably to help the director, Anthony Page, explain the intertextual reference to the actors of the 1973 Royal Court Theatre performance – which takes us into the realm of ‘epigenesis’.

Exogenesis - Epigenesis

As mentioned above, exogenetic source texts can play a role in the genesis even after publication. When Beckett was directing his play in Berlin in 1969, he decided to emphasize the contrast between light and darkness that is prominent in the play. To do so, he read up on the Persian religion of ‘Manichaeism’ in his copy of the Encyclopedia Britannica. He copied a few lines almost literally in his production notes. They are excerpted from Adolf Harnack and Frederic Cornwallis Conybeare’s article on ‘Manichaeism’ in the encyclopedia. Beckett’s theatrical notebook contains three pages about this religion of Mani, a Persian prophet, based on the strict separation of light and darkness. The notes contain two lists of ‘light emblems’ and ‘darkness emblems’ (UoR MS 1396-4-16, 24r) which are taken directly from this passage in the encyclopedia article: ‘As the earth of light has five tokens (the mild zephyr, cooling wind, bright light, quickening fire, and clear water), so has the earth of darkness also five (mist, heat, the sirocco, darkness and vapour)’ (Encyclopedia Britannica vol. 17, 573; see also Beckett Digital Library). 

            This exogenetic element did not only impact on the rehearsals of the performance, but also on the epigenetic development of the text. When he had the chance to revise his texts for a trilingual edition by the German publishing house Suhrkamp (1964), Beckett changed the material of the boxes in which Krapp keeps his reels from ‘cardboard’ to ‘tin’ (Van Hulle 2015, 90). Since he had made this change in English, he also adapted his French translation accordingly. Thus, ‘boîtes en carton’ became ‘boîtes en fer blanc’ (2015, 89). The word ‘blanc’ (white) introduced more light into the play, as it were, and – to push the Manichaean dichotomy to the extreme – Beckett therefore felt obliged to restore the balance, so to speak, by adding a dark element further on in the play (which only occurs in the French text: ‘mon vieux point faible’ (my old weakness) became ‘mon vieux point noir’ (2015, 98).

Finally, another word that gives away an exogenetic reference in an epigenetic context is ‘Hain’, added in the margins of Beckett’s annotated copy of the Faber edition (see above, ‘Faber a.c.’). ‘Hain’ is not an English word. It refers to a drawing of the Grim Reaper (in German: ‘Freund Hain’) in Beckett’s copy of the complete works of Matthias Claudius, who dedicated them to death. One of the lines Beckett wrote in the margin was ‘Action interrupted by Hain 1’. What this rather cryptic annotation means becomes clearer when we compare it to the annotation ‘Action interrupted by first look over his shoulder left into darkness backstage’, which Beckett wrote, at the corresponding instance, in the margin of his copy of the 1960 Grove Press edition (see above ‘Grove a.c.’): at three instances in the play, Beckett decided (after the play had been published) that the actor had to look over his shoulder into the dark, as if he felt the presence of the Grim Reaper.

2. Compiling a genetic dossier

Since genetic criticism includes the temporal dimension in literary studies, chronology is the backbone of the genetic dossier. To some degree, an ‘absolute’ chronology can be reconstructed thanks to dated manuscripts or references to manuscripts in letters and diaries. But many manuscripts are undated, which necessitates the construction of a ‘relative’ chronology by comparing or ‘collating’ the textual versions. This also means it is necessary to make a difference between documents and versions.

2.1. Absolute chronology

Some writers date each of their manuscripts or typescripts. Some even mark the date of every single writing session. In the digital age, with the use of keystroke logging, every single keystroke has a time stamp. In most cases, however, a lack of chronological data is what complicates the genetic analysis. In case the manuscripts do not feature any dates, other parts of what Stan Gontarski has dubbed the ‘grey canon’ can be of help, for instance letters and diaries.

            It is useful to draw up a dry account of dates and facts in order to compile the genetic dossier, and to let this ‘absolute’ chronology start well before the actual start of the first draft, because very often the ‘pre-history’ of a genesis already contains elements that contributed to the inception of an idea. In the case of Krapp’s Last Tape, for instance, the fact that the BBC decided to broadcast fragments from Beckett’s prose works (Molloy, Malone Dies, From an Abandoned Work) in the winter of 1957-1958 inspired Beckett to work with the tape recorder as a central element in his play.

2 December 1957: Beckett takes a week off from his translation of the novel L’Innommable into English to try and write a new radio play (Embers).

10 December 1957: Patrick Magee read fragments from Molloy and From an Abandoned Work on the BBC.

26 December 1957: Beckett lays aside the radio play and returns to his translation of L’Innommable.

12 January 1958: Beckett lunches with George Devine, who suggests he write a monologue for Patrick Magee; he continues working on the radio play.

21 January 1958: Beckett temporarily abandons the radio play and continues translating L’Innommable (and even marks this moment in the manuscript of his translation).

6 February 1958: Beckett is still ‘struggling with’ his translation of L’Innommable.

20 February 1958: Donald McWhinnie makes suggestions for sequences from Malone Dies to be read by Magee on the BBC; on the same day, Beckett writes the first draft of Krapp’s Last Tape (at that moment still called ‘Magee Monologue’); at the same time, he is still translating L’Innommable.

23 February 1958: Beckett finishes his translation of L’Innommable.

10 March 1958: Beckett writes to his American publisher that he has finished the ‘Magee Monologue’ and will ‘let it forth, in about a fortnight probably’.

15 March 1958: Beckett acknowledges receipt of a complete 11th edition of the Encyclopedia Britannica, sent to him by the book and manuscript dealer Jake Schwartz.

25 March 1958: Beckett sends Schwartz four typescripts of Krapp’s Last Tape (now held at the HRC).

18 March 1958: Beckett sends Krapp’s Last Tape to his American publisher, after having sent copies to George Devine and Patrick Magee, the playwright and literary agent Kitty Black and his literary agent Rosica Colin.

The few dates one can gather from diaries and letters often do not suffice to fill all the gaps in the chronological reconstruction of the endogenesis. Here, a ‘relative’ chronology can be of help. This is one of the core objectives of genetic analysis.

2.2. Relative chronology

Distinguishing between documents and versions

To make a hypothesis as to the chronological sequence of the writing process it is necessary to make a clear distinction between documents and versions. The transition from classifying the material to compiling a genetic dossier implies the ‘translation’ of the documentary material into ‘versions’. Documents are the ‘physical vessel’ that carries information (Shillingsburg 1996, 174), whereas a version has ‘no substantial existence’ (44). One document can sometimes contain more than one version; and vice versa, two or more documents may have been dispersed – for various reasons – but belong together and constitute one version.

            To reconstruct the relative chronology of textual versions, it is useful to represent them in a graph, in which the x-axis shows the narrative sequence, and the y-axis the chronology of the writing sequence.

X-axis: narrative sequence

If the narrative sequence is not marked by milestones such as chapters or paragraphs, it may be necessary to divide the text into sections or scenes. To illustrate how this can facilitate the analysis of the genesis, here is an example for Krapp’s Last Tape, dividing it into twelve scenes (the young, middle-aged, and old Krapp will be referred to as Krapp1, Krapp2, and Krapp3):


I           Mime I: stage directions and opening mime

II          ‘Spooool!’: Krapp opens his ledger and finds the right reel

III         ‘My condition’: taped voice of Krapp2, intellectually in his prime

IV         ‘Aspirations’: taped voice of Krapp2, listening to Krapp1

V          Song: Krapp3 sings backstage

VI        ‘Viduity’: widowhood of Krapp’s mother; Krapp3 has forgotten the word ‘viduity’

VII       ‘The vision’: Krapp2’s revelation

VIII      ‘Farewell to love’ I: Krapp2’s taped account of the scene in the punt with his love

IX        ‘Farewell to love’ II: Krapp3 winds back to listen to the same scene

X          Mime II: second scene backstage and preparations for new recording

XI        Recording: including reminiscence of scenes II and V

XII       ‘Farewell to love’ III: Krapp3 listens to the scene in the punt again


Y-axis: chronology of versions

A single document can contain more than one version, and vice versa: one version of the entire text can be a composite of two or more documents. To make a hypothesis about the chronology of the manuscripts and typescripts we need to ‘translate’ the list of documents into a sequence of versions and give them a siglum (a letter or abbreviation denoting a particular document or edition) that reflects their position in the chronology. Since Samuel Beckett was a bilingual author, writing in English or in French, it may be useful to indicate the language in the siglum. In the case of the first item, for instance, EM stands for the English Manuscript, ET1 for the first English Typescript:


UoR MS 1227-7-7-1 (‘Eté 56’ Notebook)     EM

HRC SB 4-2-1                                              ET1

HRC SB 4-2-2                                              ET2

HRC SB 4-2-3                                              ET3

HRC SB 4-2-4                                              ET4

UoR MS 1659                                              ET5

HRC SB 4-2-5 (thermofax copy)                  ETC

UCSD AS 103-74-10 (typed copy)               ETC’


The Reading typescript (UoR MS 1659) is marked in bold, because it has acquired a different position vis à vis the list of documents discussed in section 1.1, where they were simply ordered according to the place where they were held (HRC, UoR and UCSD). The rationale behind letting ET5 ‘jump the line’ and putting it in front of ETC (HRC SB 4-2-5) is based on a collation (comparison) of these two documents: take for instance the first sentence of the stage direction. In ET5 (UoR MS 1659), Beckett made a revision: ‘A late evening in the nineteen eighties’ became ‘A late evening in the nineteen eighties future.’ In ETC (HRC SB 4-2-5) this revision is already incorporated into the typed text: ‘A late evening in the future.’

X and Y-axes: versions and the size of the textual unit

When the notion of ‘version’ is used in textual scholarship, editors usually implicitly mean a version of the entire text. But very often, what you encounter in the archive is only a partial version, either because only part of it has been preserved, or because the author rewrote only part of the text at some point. That is why it is always useful to make explicit the size of the textual unit you are working with. The following table uses the twelve ‘scenes’ of Krapp’s Last Tape as textual units. The result is that we can be much more precise in the delineation of partial versions. Thus, for instance, EM turns out to contain 3 different partial versions: EM1 is a version that stops after scene V; EM2 starts only from scene III; and EM3 is a rewriting of only the first three scenes.


  1. EM1       I | II | III | IV | V
  2. EM2       III | IV | V | VI | VII | VIII | IX | X | XI | XII
  3. ET1        III | IV | V | VI | VII | VIII | IX | X | XI | XII
  4. EM3       I | II | III
  5. ET2        I | II | III | IV | V | VI | VII | VIII | IX | X | XI | XII
  6. ET3        I | II | III | IV | V | VI | VII | VIII | IX | X | XI | XII
  7. ET4        I | II | III | IV | V | VI | VII | VIII | IX | X | XI | XII
  8. ET5        I | II | III | IV | V | VI | VII | VIII | IX | X | XI | XII
  9. ETC       I | II | III | IV | V | VI | VII | VIII | IX | X | XI | XII


In the critical analysis (the ‘criticism’ part of genetic criticism), you can thus focus on each of these nine versions as a syntagmatic unit (i.e. the narrative and dramatic development of each version as a whole, consisting of multiple scenes). It is equally possible to focus on the paradigmatic axis (the ‘vertical’, chronological development) of each individual scene:

















































































































The whole chronology can be charted in a genetic map, which allows us to be more precise. For instance, the ‘Eté 56’ Notebook (EM) also contains one page (f. 21r) with two short drafts that were written after Beckett had made a first typescript and were meant to be added to scenes II and IV in ET1, so that the partial drafts on two documents (ET1 and EM, f. 21r) – together – constitute Version 3:

It can always happen that there are gaps in the manuscript record, instances where you suspect a manuscript or typescript may be missing. Sometimes, these missing documents resurface, however briefly. For instance, on 8 July 2004, Sotheby’s auctioned a typescript of Krapp’s Last Tape, described in the auction house’s catalogue as ‘the author’s “personal” corrected carbon copy typescript’. Even though the document is not publicly accessible, the half page published in the catalogue allows us to determine that this typescript is actually a carbon copy of ET5 (UoR MS 1659), on which Beckett has added the same corrections. [10]

3. Transcription

So far, we have been dealing with manuscripts, assuming that they were legible, or at least decipherable. There is a difference between deciphering and transcribing. Deciphering is a form of reading, which can be done in silence without any external tool. Transcribing implies both reading and writing: the transcription is made for someone else. This someone else can also be the transcriber’s later self, in the sense that, when they are in an archive and have no means of photocopying, they may want to write down for future reference what they encountered in a manuscript, and they want to do so as precisely as possible, so that later on they or someone else would be able to reconstruct the original on the basis of their transcription.

            The two most common ways of transcribing are ‘topographic’ and ‘linearized’ transcriptions. The topographic transcription tries to recreate as much of the ‘topography’ (the precise location of each line, each deletion, each addition) of the manuscript, as in this topographic transcription of an early draft of Beckett’s penultimate text, Stirrings Still.



This approach sees the manuscript primarily as an image, with important toposensitive information that needs to be translated into a chronosensitive hypothesis about the writing sequence. Against this background, Daniel Ferrer has noted that a manuscript is not a text, but a protocol for making a text (Ferrer 2011, 43). In the case of the above example, even the checked paper of the original has been imitated. The disadvantage (in a digital context) is that the transcription does not translate the facsimile image into a searchable text, but into another (unsearchable) image (produced by means of graphic software like Photoshop). In the French tradition of genetic criticism, this form of transcription accords with the principle of ‘donner à voir’ (made for looking), as opposed to the principle of ‘donner à lire’ (made for reading).

The latter approach considers it the role of the transcriber to provide a text that facilitates the reading and will therefore tend to linearize the textual features of the manuscript, for instance in this sample from ET1:




The transcription convention of this sample deliberately tries to reduce the number of diacritical signs to a minimum. A ‘Note on the transcriptions’ could read as follows:

The rationale behind the transcription convention is to represent the drafts with as few diacritical signs as possible, presenting deletions by means of strike-through; additions by means of superscript; uncertain readings by means of grey type; and illegible words by means of three xxx (or only one or two in case of less than three illegible characters).

The linearized transcription still leaves open many possibilities to mark visual particularities as well. For instance, the difference between typed text and handwritten annotations can be rendered by means of different fonts. The line breaks can be respected. The blank space in front of ‘Street’ can just be marked as a blank space, indicating that the author had not yet found the right name for it. And if the linearized transcription is presented in parallel with a (digital) facsimile, the combination (‘à voir’ + ‘à lire’) is greater than the sum of the parts as it shows the translation of the toposensitive facsimile into the chronosensitive linearized sequence. In this sample, the ‘Insert’ is marked as a ‘meta-mark’ with a note (a boxed N), explaining that the passage to be inserted can be found on folio 21r of the ‘Eté 56’ Notebook, rather than actually insert this passage in the transcription of ET1: 




4. Digital Genetic Editing

When, as in the case of Samuel Beckett, an author’s drafts are not preserved in a digital format, it is useful to digitize them for both preservation and research purposes. Digital archives and digital editions [11] offer digital facsimiles and transcriptions of modern manuscripts. The scanning of the documents is usually done by a professional team at the holding library, but in case you find a manuscript in a private collection and have the opportunity to make scans, it is useful to be aware of a few basic rules of thumb, such as:

to scan the entire document, not just the text of the document, in other words:

-       to keep enough distance (about 1 cm) on all sides of the document between the edge of the document and the edge of the scan;

-       to scan in high resolution (minimum 300 dpi, 24 bit, saved in an uncompressed format such as TIFF);

-       to use a size reference and a calibration strip to enable users to get a sense of the document’s size and measure changes in colour on different computer screens against the colour strip.

To make the text of the document searchable, Optical Character Recognition (OCR) can be an option in the case of printed text. If it is a manuscript or annotated typescript, the available software for Handwritten Text Recognition (HTR) is not quite up to standard yet, at least not to transcribe complex manuscripts. While HTR algorithms are being developed and improved, the more complex aspects of handwritten texts need to be transcribed manually.


4.1. Transcribing in TEI-XML: a practical guide

For short transcriptions in a research essay, transcriptions can be made by means of word processing software, with a transcription convention such as the ‘Note on transcriptions’ above. For longer transcriptions and editions, one cannot exclude that the word processing software will no longer be supported by any operating system. Since sustainability and longevity are what a scholarly edition typically aims for, editorial teams often choose to use a mark-up language that works with Unicode – a standard for the encoding of text in most writing systems consisting of more than 100,000 characters – and UTF-8 (Unicode Transformation Format 8-bit) to represent each character of the Unicode character set. To ‘encode’ text in this way, the prevalent mark-up language in scholarly editing at this moment is XML (extensible mark-up language). It was published by the World Wide Web Consortium (W3C) in 1998 and ‘provides a simple way of representing structured data as a linear stream of character data, and of labelling particular parts of that stream with named tags to indicate structural function or semantics’ (Burnard 2014). What is meant by ‘structural function’ in this definition resembles the matryoshka dolls model. The XML model approaches text in terms of hierarchies. The result is a hierarchical syntax with a ‘nesting’ structure. In this syntax, for example, a word in a poem on death can be conceived as a small matryoshka element within larger structures:












Special features of the text are marked up or encoded with ‘tags’, which are indicated with pointed bracket (such as <line>) and can contain various kinds of information, not just on the position of a word in a text. As opposed to word processing software such as MS Word, which allows you to, for instance, put a word in italics, but hides the command to do so, XML will make this command explicit by means of the tags (for instance the <i> of italics):


In MS Word:

She saw the original manuscript of Frankenstein.



She saw the <i>original</i> manuscript of <i>Frankenstein</i>.


Each marked-up element is opened by an ‘opening tag’ (<i>) and closed by a ‘closing tag’ (</i>). Given XML’s hierarchical nesting structure, the order of tags is not arbitrary. If two structures interact in a non-hierarchical way, this results in a problem of ‘overlapping hierarchies’. For instance, the following mark-up is incorrect:


*<i>This sentence is in italics, and <b>part of it is also in bold </i></b> .


It is important to close the bold tag </b> before the italics tag </i>:


<i>This sentence is in italics, and <b>part of it is also in bold </b></i> .


To try and standardize the tagging, a consortium called the Text Encoding Initiative (TEI) ‘develops and maintains a standard for the representation of texts in digital form’ ( It started in 1987 and, ever since, its aim has been to develop hardware- and software-independent methods for encoding humanities data in digital form: ‘The TEI emphasizes what is common to every kind of document, whether physically represented in digital form on disk or memory card, in printed form as book or newspaper, in written form as manuscript or codex, or in inscribed form on stone or wax tablet.’ (Burnard 2014, ). The TEI Guidelines are an extensive list of suggestions to encode various textual phenomena, for instance if you wish to indicate why a certain passage is in italics (in the case of the example above: original because it is highlighted, and Frankenstein because it is a title):



She saw the <hi>original</hi> manuscript of <title>Frankenstein</title>.


This encoding enables, on the one hand, users to analyse the text (to search for instance how many titles are mentioned in the text; or what kind of words are emphasized/highlighted in the text), and on the other hand, editors to decide later on how to visualize the text in particular ways (for instance to render all titles in the document in red and all highlighted elements in green, for whatever reason) or to change the visualization at a later stage, so that a decision about transcription conventions at an early stage of the project is not set in stone forever. For instance, as the manuscript of Frankenstein features the hands of both Mary and Percy Bysshe Shelley, the value of the @hand attribute can be made operational to visualize their hands differently, as in the Shelley Godwin Archive . It also enables editors to store tagged information without necessarily visualizing all of it. Evidently, this is not the place to analyse all the details of the TEI Guidelines , which are available online, but the rich information in these guidelines may be so profuse at a first reading that a brief discussion of a subset may be useful. The following example – the first page of the first draft of Krapp’s Last Tape (the so-called ‘Eté 56’ Notebook; UoR MS 1227-7-7-1, fol. 11r) – is meant to illustrate some of the more frequently recurring features and tags in the transcription of modern manuscripts, based on a subset developed for the Beckett Digital Manuscript Project. [12]



A TEI-XML transcription can be made in the most rudimentary text editor (such as Notepad). An XML editor (such as oXygen) offers more specific assistance, facilitating the tagging.  Before starting with the transcription (step 3), two small steps need to be taken first, to prepare the XML document.


Step 1: ‘prologue’ and ‘root element’

Every XML document needs to have a ‘prologue’ and a ‘root element’. The prologue is a declaration of the version of XML that is being used and of the Unicode character set in which the document is encoded, for instance:


<?xml version="1.0" encoding="utf-8"?>


The root element is the XML element that contains all other elements (the largest matryoshka doll as it were). For the purposes of this transcription, the root element is <TEI>. In this element, we need to declare which namespace [13] we will be using in our document by adding a reference to the  @xmlns [14] (XML namespace) attribute. In our case, that would be the namespace defined by the TEI Consortium:


<TEI xmlns="">



These opening and closing tags will enclose all the other elements that will be used for the transcription.


Step 2: ‘teiHeader’

For the purposes of this sample transcription, a text-oriented approach will be demonstrated, working within a <text> tag. For a document-oriented approach, the TEI advises to work with the <sourceDoc> tag. [15] Within the largest of matryoshka elements (<TEI>…</TEI>), the TEI advises to work with at least two parts, the <teiHeader> and the <text>:


<?xml version="1.0" encoding="utf-8"?>

<TEI xmlns="">







The header should contain a few required elements:




       <titleStmt><!–- project title information here --></titleStmt>

 <publicationStmt><!-- publication information here --></publicationStmt>

       <sourceDesc><!-- source description here --></sourceDesc>




For the purposes of this sample transcription, the title statement can simply contain the title, the name of the author and the name of the person responsible for the transcription:



<title>One page from a manuscript by Samuel Beckett</title>

<author xml:id="SB"><name>Samuel Beckett</name></author>


      <resp>transcribed by</resp>

      <name xml:id="yourInitials"> your name </name>




The metadata about the place of publication and the publishing house or institution that publishes your transcription can be mentioned in the ‘publication statement’. For instance:




       <date> date </date>


        <licence>Copyright © Samuel Beckett 1958</licence>

        <p>Not to be distributed. Made available for educational purposes only.</p>



</publicationStmt> [16]



A bibliographical description of the transcribed document can be accommodated in the ‘source description’, for instance:



<bibl> MS fragment from the first draft of <title>Krapp’s Last Tape</title> by Samuel Beckett, written in blue ink in a notebook, the cover of which is inscribed ('Eté 56') by the author, containing 96 leaves, blank from f. 45 until f. 96 verso. Squared paper. 22 x 13 cm. The fragment can be found on f. 11r.</bibl>



The <teiHeader> can be filled with further options and tags that provide more detailed metadata about the digital file, the source document and the encoding, which can be convenient and helpful for cataloguing. For the purposes of this sample transcription, the mandatory fields will suffice to start encoding the transcription proper in the <text> element.


Step 3: framing the ‘text’

After the </teiHeader> has been closed, the transcription can be contained in the <text> tag for a text-oriented approach (see above) [17] :


<?xml version="1.0" encoding="utf-8"?>

<TEI xmlns="">



       <titleStmt><!–- project title information here --></titleStmt>

<publicationStmt><!-- publication information here --></publicationStmt>

       <sourceDesc><!-- source description here --></sourceDesc>







The <text> element can contain a <front>, a <body> and a <back> element. In case the document is a book, <front> denotes the front matter of the source document (such as title page, title verso, table of contents); <body> consists of the text of the source document (for instance, all the chapters of a novel); <back> will contain the back matter of the source document (such as the index, bibliography, blurb and other paratextual material). Since the present sample is a transcription of a manuscript page, there is neither a <front> nor a <back> element, only a <body>. For the transcription of, for instance, a novel, the <body> can be divided into chapters. The chapter would then serve as a ‘division’, marked by means of a <div> tag. In this case, the body of the text is just one recto page from a notebook, so this page could serve as a division or <div>, accompanied by a number attribute @n to indicate the page number (11r):




        <div n="11r" type="page" rend="recto">

            <p> Some text here. </p>





So far, the framework has been set up. Before confronting the nitty-gritty of the transcription, this may be a good moment for a short recapitulation and a few rules of thumb about the TEI syntax:


·      The general rule is that each element has an opening tag (e.g.: <text>) and a closing tag (e.g.: </text>).

·      The only exception to this rule is the so-called ‘empty element’. Empty elements point to one specific instance in the text, rather than to a selection of text. Empty elements end with a ‘/’ (for instance: <lb/> for a line break)

·      As in a family tree, a tag can be either a child, a sibling or a parent of another tag. In <text><body></body></text>, for example, <text> is the parent tag of <body> and <body> is the child tag of <text>.

·      The TEI syntax does not allow you to randomly put any child tag under a parent tag. Each tag must be the child of one of the tags in its ‘Contained by’ section (see TEI Guidelines,, and each tag can only be the parent tag of the tags specified in its ‘May contain’ section (for instance, the sequence <TEI><text><body></body></text></TEI> is possible, while <TEI><body></body></TEI> is not).

Attributes and their values

·      Most tags can be further specified by means of ‘attributes’.

·      In texts about tagging (such as this one), when attributes are discussed outside of the context of an XML document, they are preceded by an @ sign (for instance, the ‘number’ attribute in the <div n="11r"> tag is @n).

·      Attributes are written inside the tag’s angled brackets, and their values sit inside double quotation marks (for instance, in the tag <div n="11r"> , @n is the tag’s attribute, and 11r is the attribute’s value)

Special case: @xml:id

·      Some tags will include an @xml:id attribute with a value that is a unique identifier, so that it can be easily referred to from anywhere else in the transcription. This unique identifier makes it easy to easy to refer to a single, specific element. For example, if you ‘declare’ in the <teiHeader> that Samuel Beckett is the author of the transcribed work by typing: <author xml:id="SB"><name> Samuel Beckett </name></author> , the system will recognize that every time the initials “SB” (preceded by a # – see two bullet points below) are mentioned, these refer to Samuel Beckett. 

·      For this system to work, every @xml:id has to be unique (it can only occur once in the same XML-document), and has to start with a letter (e.g.: <seg xml:id="qsdfjklm"> is correct, while <seg xml:id="1qsdfjklm"> is incorrect). The length of the @xml:id may vary. [18]  

·      This way, attributes such as @hand (to identify the hand of the author of the manuscript) or @resp (to identify the transcriber) can point to a previously declared @xml:id. To do so, the value of that attribute must begin with a #, followed by the @xml:id (for instance, <del hand="#SB"> is correct, while <del hand="SB"> is incorrect). <del hand="#SB"> means that a deletion in a manuscript is made in Beckett’s handwriting.


·      In view of the visualization of your encoding (the way the transcription will appear on screen in an edition), it is advisable to use tags and spaces in such a way that, if all the tags would be taken out, the result is a normally spaced text. For instance, it is not advisable to put a space between the opening element of a tag and the first word within that tag, nor to put a space between the last word within a tag and the closing element. For example:

A <del>deleted</del> word = recommended

A <del> deleted </del> word = not recommended

A<del> deleted </del>word = not recommended

Step 4: the transcription proper

Once the division has been determined, the transcription work can begin. In our case, the division works per page, so the @type is “page” and since it is a right-hand page, the @rend attribute’s value is “recto”. The page break can be indicated by means of an ‘empty element’: <pb n=”11r”/>. Everything that is on this page can be contained within the <div> element, to begin with the title, which can be marked as such by means of the <head> tag. Since the title is in blue ink, this can be marked by means of the @rend attribute. It is also underlined, which is a form of highlighting, the rendition can be marked by means of a @rend attribute with the value “u”:


<div n="11r" type="page" rend="recto">

<pb n="11r"/>

<head rend="blueink">

<hi rend="u">

Magee Monologue





The page is numbered (11) in pencil. This numbering may have been added later by the archivist at the University of Reading. This information can be added, for instance, by means of a <note>. The numbering is also a ‘metamark’: it does not belong to the text of the version (the first draft of Krapp’s Last Tape).

Beckett did not always date his manuscripts, but here he did. This can be marked by means of the <dateline> and <date> tags:



<metamark function="pagenumber">


<note>Page numbering in pencil probably added by the archivist.</note>




<date rend="blueink">






One of the most frequently used tags, especially in prose texts, is <p> for paragraph. Since this is a play, the text will mainly consist of a combination of stage directions and dialogue or monologue. The text of the manuscript opens with the words ‘wearish small old man’. If the handwriting is hard to read, it always helps if (part of) the same sentence occurs in a later version. The published text of the 1958 Evergreen Review edition opens as follows:


[0001] A late evening in the future. [0002] KRAPP'S den. [0003] Front centre a small table, the two drawers of which open towards the audience. [0004] Sitting at the table, facing front, i.e. across from the drawers, a wearish old man: KRAPP. (1958, 13)


For a genetic edition, one of the most powerful tools is the possibility for the user to diachronically compare segments (<seg>) of the text, across versions. To enable this type of genetic research, it is helpful to number the segments. It is up to the editor to decide what the size of these segments will be. The Beckett Digital Manuscript Project works with the sentence as a unit of comparison. The ‘sentence’ is broadly defined as syntactic unit that ends with a full stop, an exclamation mark or a question mark. And the first edition serves a ‘base text’ (or ‘anchor text’) that determines the numbering of the sentences. [19] In the case of the first draft of Krapp’s Last Tape, the opening words ‘wearish small old man’ – corresponding with (part of) sentence 4 in the published text – are therefore not encoded as the first segment, but as the fourth:




<seg n="MS-UoR-1227-7-7-1,[0004]">wearish small old man,</seg>




Even though the (verbless) sentence goes on (‘wearish small old man, almost blind’), the segment is closed </seg> after ‘man,’ because the following words ‘almost blind’ do not correspond to sentence 4, but to sentence 14 in the published text: ‘Very near-sighted (but unspectacled).’


<seg n="MS-UoR-1227-7-7-1,[0014]">almost <lb/>blind</seg>


The word ‘almost’ is on the first line, ‘blind’ on the second. This situation confronts us with a rather fundamental choice. Do we treat this manuscript as a document or as a text? Has the topography priority or the text? A document-oriented approach would imply that we treat each line as an <l> element, which would imply that ‘almost’ (in line 1) would be separated from ‘blind’ (in line 2). In a text-oriented approach the words ‘almost blind’ are treated as one syntactic unit and encoded as a segment <seg>. [20] From the text-oriented perspective, it is strictly speaking not necessary to indicate where the line ends, but it is certainly possible to do so, by making use of the empty element <lb/> (line break):


      <p><seg n="01">This is the first sentence.</seg> <seg n="02">This is the

<lb/>second sentence.</seg></p>


The next words in the manuscripts are a parenthesis: ‘(x objects, writing up against eye)’. This segment never made it into the published text. Since it therefore does not correspond with any sentence in the base text, a solution for the numbering is to take the number of the preceding sentence that did make it into the base text and add a vertical bar | followed by a second numbering. In this case [0014|001] means this is a segment that did not make it into the final text and is situated in this manuscript as the first segment after the segment corresponding to sentence 14, which did make it into the published text.

            The first word in the parenthesis is preceded by a cancelled letter. As a deletion, it can be encoded with the <del> element. The <del> tag can have several attributes, for instance the @hand to indicate who made the deletion. In this case the ‘value’ of the @hand attribute is #SB. The @rend can be used to mark the writing tool, in this case blue ink. Because the cancellation often makes the deleted text hard to decipher, the person who is responsible for its transcription can be credited by means of the @resp attribute and the person’s initials preceded by a # as the value. [21]

In this particular case, the deleted letter has been so thoroughly crossed out that it has become illegible. This means the text is <unclear> and can be marked up as such. If the text underneath the cancellation cannot even be conjectured, a possible transcription convention is to use maximum three xxx per word, or less in case only one or two letters are deleted.


<seg n="MS-UoR-1227-7-7-1,[0014|001]">

(<del type="crossedOut" hand="#SB" rend="blueink" resp="#DVH">

    <unclear reason="crossedOut" resp="#DVH">x</unclear>



<unclear resp="#DVH">wr</unclear>iting up against<lb/>

<unclear resp="#DVH">eye</unclear>),



The next few segments are fairly straightforward, and correspond with the highlighted passages in the following quotation from the base text:


[0015] Hard of hearing . [0016] Cracked voice. [0017] Distinctive intonation. [0018] Laborious walk. [0019] On the table a tape-recorder with microphone and a number of cardboard boxes containing reels of recorded tapes. [0020] Table and immediately adjacent area in strong white light . [0021] Rest of stage in darkness . (1958, 13)


Sentences 16 to 18 are not yet present in the first draft. Instead, the draft mentions that the protagonist is sitting at a table, which – in the published version – is mentioned in sentence [0004]. In the manuscript, segment [0019] contains another deletion, this time followed by an addition, encoded with an <add> tag . [22] To transcribe the addition, it may be useful to indicate where it is added by means of a @place attribute, with values such as ‘supralinear’ (above the line), ‘infralinear’ (below the line), ‘facingleaf’ (if the addition is made on the facing page), ‘inline’ (in the case of a currente calamo or instant revision), ‘overwritten’ (if the addition is written on top of a previously written word), ‘marginleft’, ‘marginright’, ‘marginbottom’, ‘margintop’. [23]


<seg n="MS-UoR-1227-7-7-1,[0015]">

almost deaf (cupped ear down<lb/>

almost touching tape),



<seg n="MS-UoR-1227-7-7-1,[0004]">

sitting <lb/>

centre on small plain wooden chair <lb/>

at small plain wooden table.<lb/>



<seg n="MS-UoR-1227-7-7-1,[0019]">

On table: a thick worn ledger, <lb/>

a tape-recorder and a number <lb/>

of cardboard boxes


<del type="crossedOut"><unclear reason="crossedOut">filled</unclear></del>


<add hand="#SB" place="supralinear" rend="blueink" resp="#DVH">containing</add>


recorded <lb/> tapes.



<seg n="MS-UoR-1227-7-7-1,[0020]">

Table and immediately adjacent <lb/>

zone in a circle of strong light,



<seg n="MS-UoR-1227-7-7-1,[0021]">

rest <lb/>

of stage in shadow.<lb/>



Here, the stage directions end. When the </stage> tag is closed, the monologue starts with the text spoken by Krapp, who does not have this name yet in this early draft and is simply called ‘A’. He reads in his ledger, ‘Box … three … spool … five’, turns his head without raising it and then relishes the long-drawn-out pronunciation of the word ‘spoooool’. His speech is contained in the <sp> element. In segment [0038] it contains a deletion in an addition and an addition to this deleted addition, all of which can be encoded:


<sp who="#A">



<seg n="MS-UoR-1227-7-7-1,[0036]">


(<hi rend="u">reading from ledger, his nose down on it</hi>)




<seg n="MS-UoR-1227-7-7-1,[0037]">Box ... three ... spool ... five<lb/></seg>



(<seg n="MS-UoR-1227-7-7-1,[0038]">

<hi rend="u">He turns his head

<del type="crossedOut" hand="#SB" rend="blueink" resp="#DVH">from ledger</del>


<add hand="#SB" place="supralinear">

<del type="crossedOut"><unclear reason="crossedOut">away from table</unclear></del>

<add hand="#SB" place="supralinear">

<del type="crossedOut"><unclear reason="crossedOut">xxx xxx</unclear></del>




<add hand="#SB" place="supralinear" rend="blueink" resp="#DVH">front</add>,<lb/>

without raising it</hi>.



<seg n="MS-UoR-1227-7-7-1,[0039]"><hi rend="u">Appreciatively</hi>.<lb/></seg>)


<seg n="MS-UoR-1227-7-7-1,[0040]">Spoool!</seg>


<seg n="MS-UoR-1227-7-7-1,[0041]">

<stage>(<hi rend="u">again, relishing the word.</hi>)</stage><lb/>


<seg n="MS-UoR-1227-7-7-1,[0042]">Spooool!</seg>



The page ends with a stage direction, that is interrupted by the page break. Since the page was taken as a division, the </div> and all the other elements it contained (such as </p> and </sp>) need to be closed as well in the right order:




(<seg n="MS-UoR-1227-7-7-1,[0065]" rend="part1" xml:id="MS-UoR-1227-7-7-1d1e410">

<hi rend="u">He turns<lb/>

head front without raising, stares<lb/>








By taking the page as a division, an important element of a document-oriented approach can be taken into account in the text-oriented approach. To solve the problem that a sentence begins on one page and ends on the next page, the BDMP decided in 2008 to connect two <seg> elements to each other by making use of the @rend attribute and the unique identifier @xml:id. The value of the @rend attribute is “part1” for the first part of the sentence (on page 11r); in the second part of the sentence (on page 12r), the value of the @rend attribute is “part2”. Both parts of the sentence get the same automatically generated unique identifier (xml:id="someID"), the only difference being that in the second part of the sentence “part2” is added at the end of the identifier (xml:id="someIDpart2"). [24]


<div type="page" rend="recto" n="12r">

<pb n="12r"/>

<metamark function="pagenumber">


<note>Page numbering in pencil probably added by the archivist</note>


<sp who="#A">

<p><stage><seg n="MS-UoR-1227-7-7-1,[0065]" zone="zone_33" corresp="#12r" rend="part2" xml:id="MS-UoR-1227-7-7-1d1e410part2"><hi rend="u"><unclear>blankly</unclear> before him.</hi></seg></stage></p></sp>




By choosing the page as a division (<div>), it is also possible to link the XML transcription to the digital facsimile. In the Beckett Digital Manuscript Project, this coupling of text and image happens at the level of the ‘zone’, a flexible textual unit of about a half dozen lines (depending on the context). The content of the zone can efficiently be linked to the corresponding sentences in the XML transcription. The advantage of this unit’s size is that it facilitates the ‘image/text’ visualization enabling the immediate comparison of the topography of the facsimile (document-oriented) with the linearized transcription (text-oriented). The image/text view is the most frequently used way of reading the transcriptions.



The zone can be drawn on the facsimile with a simple, free software tool like ImageJ, and in the BDMP the four coordinates necessary to encode this rectangular selection in the xml are stored in a <div> element. For instance:


<div sentence1="MS-UoR-1227-7-7-1d1e166" xml:id="zone_27" type="paragraph">




The coordinates for each zone are coded separately outside of the body of the transcription using a few XML elements that are project-specific to the BDMP. [25] Each zone is given an identifier (in the case of the ‘wearish small olf man’ paragraph: “zone_27”) by which it is linked to the group of sentences (<seg> elements) that occur on that zone on the image.


<seg n="MS-UoR-1227-7-7-1,[0004]" zone="zone_27" corresp="#11r" xml:id="MS-UoR-1227-7-7-1d1e166">

wearish small old man,



Transcribing manuscripts in TEI/XML can be a laborious task and researchers may wonder whether it is worthwhile, but the advantages of having transcriptions in this structured format are considerable. The result can be transformed into an unlimited number of visualisations and into many formats, such as HTML, PDF, EPUB, JSON, TXT, LATEX. A transformation stylesheet, usually marked up in XSLT (XML Stylesheet Language: Transformations), can be of help to specify visualisation instructions for every element in the transcription: for instance additions in superscript or in a particular colour, deletions struck through or hidden, unclear text in grey or between brackets. These specifications can even be made on the basis of attribute values: for instance, one might want to put only supralinear additions in superscript, but not additions that are encoded as ‘inline’. Or for example, based on the value of the @rend attribute indicating the writing tool, the various annotations can be visualized in the same colours as they appear in the draft. In a digital scholarly edition, an editor does not need to settle for one visualisation. In a menu, various options can be offered to users based on their particular interest. Since layout rules are stored separately from the TEI-XML encoding, visualisation conventions can be changed easily for extensive collections. In addition, the encoding creates many possibilities for more refined text searches. As indicated above, by indicating for instance that not all passages in italics are titles, it becomes much easier to search for all the titles mentioned in the text. The more the transcriber encodes, the more flexible the text becomes in view of future search actions and other areas of digital text analysis.

4.2. From microgenesis to macrogenesis

The transcription work focuses on the microgenesis [26] of ‘intradocument variation’ (layers of writing within this one draft). The macrogenesis opens up the scope across versions.

Thanks to the numbering of the segments, the digital architecture of the genetic edition can be designed in such a way that it can retrieve all the versions of one particular sentence and visualize it in a ‘synoptic sentence view’. This view enables users to compare versions, which is what Donald Reiman dubbed ‘versioning’ (Reiman 1987, 167-180), but at the level of the sentence, which facilitates the comparison. Evidently, different sizes of textual units can be chosen as a working unit (see above).

In the synoptic sentence view, the sentences in the manuscript that did not make it into the final text, such as segment [0014|001] (see above), are highlighted in bold type and linked to the preceding sentence that did make it into the final text (in this case segment [0014]).

To turn this form of versioning into the equivalent of a critical apparatus of textual variants, recent developments in (semi-)automatic or computer-assisted collation (such as Juxta, CollateX or HyperCollate) enable editors to highlight the variants between versions. ‘Collation’ in this context of textual scholarship differs slightly from ‘collation’ in the context of bibliography. While the bibliographer’s collation is quite a specific, condensed form of describing a book’s size, the number of leaves, the way they are bound and numbered (or not), the scholarly editor’s collation is a comparison of versions. The result of such a comparison (the so-called ‘critical apparatus’) often used to be the most tedious part of the edition (at least the way it was represented traditionally in the print paradigm): a long list of differences between published works, usually at the back of the edition. In German editorial theory, this type of apparatus has therefore been called a ‘Variantenfriedhof’ – ‘a cemetery of variants’ – whereas, actually, burying is the opposite of what editors really want to do with variants. This is therefore a plea for keeping variants alive by means of ‘genetic editing’ and presenting works of literature both as products and as processes.

Take for instance the moment Krapp3 listens to Krapp2 (on the tape) recalling his mother on her deathbed. The second typescript version reads: ‘there is of course the house on the canal where mother lay a-dying, in the early autumn, after her long viduity’ (BDMP3, ET2, 03r). This is a tape that Krapp made decades earlier. In the meantime, he is an old man and when he hears the word ‘viduity’, he gives a start, switches off the machine, winds back the tape a little, and listens again: ‘– a-dying, in the early autumn, after her long viduity, and the –’ He switches off the machine, and looks ‘Puzzled’ (03r).

Intradocument variants

The old Krapp asks himself ‘Viduity?’ In the thirty years since he has made the tape, he has apparently forgotten the word, and now decides to look it up in the Concise Oxford dictionary. Beckett then changes his mind and writes in the margin: ‘or Johnson’s dictionary and quotes example’



The next version shows an interesting intradocument variant. The first (typed) layer of writing reads: ‘– a-dying, in the early autumn, after her long viduity, and the –’ (exactly as in the preceding typescript). After having typed ‘viduity’, Beckett changes this in the margin to the same word, but preceded by an important pause: he lets Krapp hesitate and then use the rather obscure word ‘viduity’. Beckett thus introduces pauses as a sort of gaps of silence in the text.



Interdocument variants

Readers who are interested in the interdocument variants can choose the option ‘compare sentences’, which makes the <seg> number (see above) appear in front of every sentence. For instance, the stage direction that says that Krapp switches of the machine and looks puzzled is sentence 187. This hyperlinked number leads the reader to the ‘synoptic sentence view’.


Since the word ‘Puzzled’ was omitted in the next versions (and therefore, in the XML encoding, received the number of the previous sentence that did make it into the published version, followed by an extra number, [0187|001]), it is highlighted typographically (in bold typeface) in the synoptic view to indicate that it did not make it into the published text.

            As indicated above, this form of versioning has the advantage over a traditional apparatus that it keeps the syntax of the sentence intact; but it has the disadvantage that it does not highlight the variants. To highlight the variants, the integrated collation tool CollateX (developed by Ronald Haentjens-Dekker) collates the versions ‘on the fly’, within a hundredth of a second (partially thanks to the small size of the textual units of comparison, <seg>). 

To show how this collation tool can be useful to trace interdocument variation, the ‘viduity’ is a good example. It is a superb moment in the play because, not unlike the ‘hole’ in the text thanks to the intradocument variant ‘…(hesitates)…’, Beckett here uses an interdocument variant to creates a sort of hole in Krapp’s memory, a lexical gap in his vocabulary. This gap was not present in the first two drafts (the manuscript and the first typescript), where he simply used the word ‘widowhood’. It is thanks to a simple interdocument variant between the first and second typescript (versions 2 and 3) that Beckett creates the opportunity for himself to build the whole scene with the dictionary and elaborate on the gap in Krapp’s vocabulary, a lexical emptiness or ‘viduity’ as it were.




The later versions also show an interesting interdocument variant that illustrates the workings of the collation tool. In the published text, Krapp is listening to the tape he made thirty years earlier and hears the words ‘a-dying, in the late autumn, after her long viduity’, he winds back the tape, and when he listens to the same passage again ‘in the late autumn’ has disappeared. This is a clear instance of a textual error. In the movie industry, it would be called a continuity error: the words ‘in the late autumn’ are on the tape so they should still be there when Krapp rewinds and listens to it again. The collation output helps the reader by making the (erroneous) omission of this passage visual: Beckett forgot to copy ‘in the late autumn’ in the fifth version of this sentence [0186], and it took several editions before the error was noticed and emended.



Computer-assisted collation has a relatively long tradition in digital humanities, going back at least to the use of TUSTEP by Hans Walter Gabler and his team for the production of their edition of James Joyce’s Ulysses. So far, collation tools have mostly been used as tools for editors, to assist them in producing a critical apparatus. By integrating CollateX in the edition, automatic collation can also be offered as a tool for users, to highlight variants.

The collation tool thus serves as a ‘collation engine’, analogous to a ‘search engine’. The collation engine takes the digital transcriptions as input and performs a service for the user, who can leave certain witnesses out of the collation if they so choose. In this way, instead of turning the collation into a critical apparatus that is often experienced as the most tedious part of a critical edition, a digital edition can offer automatic collation as an alternative tool to help users discover complex and therefore interesting textual instances in the manuscripts and other textual versions. The integration of CollateX has the advantage that it can collate manuscript versions: the tool is able to recognize deleted and added passages (which are so typical of transcriptions of modern manuscripts); moreover, when it collates these passages, it can also produce a collation output that visualizes the result with the same transcription conventions as the ones used elsewhere in the edition.

Reading across versions

The rationale behind offering this tool to the reader rather than only to the editor is that the collation of different versions is an active form of reading across versions, which is neither a privilege nor a chore that is exclusively reserved for textual scholars. At this moment collation tools may not yet produce results that are 100% reliable (i.e. not as reliable as a critical apparatus), but the same goes for other everyday tools such as search engines, which also produce results that have to be filtered by the reader. While we continue to collaborate to make the collation algorithm smarter, reading across versions with a collation engine does not require more ingenuity from readers than a search engine does. The advantage is that textual variants do not need to be buried in the so-called ‘Variantenfriedhof’ of a critical apparatus, but that readers can actively engage with the differences between versions, in a way that allows them to zoom in and out and thus move freely on the continuum between close reading and more distant reading.


[2] "il y a une grande distance apparente, dans les procès évolutifs, entre ce que l’on pourrait appeler les effets d’exogenèse et ceux d’endogenèse. Le terme d’exogenèse pourrait paraître dangereux aux puristes de la textualité, s’il signifiait qu’on sortît des documents écrits. [ ] Il est vrai qu’il n’est pas facile d’éviter le ‘ceci cause cela’, le balzacien ‘voici pourquoi’, la présentation de l’œuvre comme un reflet de la vie [ ] . Par conséquent, nous ne pouvons nous placer au départ que du point de vue du ‘donné’ écrit, dans la mesure où il a été conservé ; nous ne pouvons qu’essayer d’analyser, de structurer, en évitant le plus possible l’herméneutique, l’ensemble des signifiants conservés" (Debray Genette 1977, 24).

[3] Rudolf Mahrer and the contributors to ‘après le texte’ (issue nr 44 of Genesis) call it ‘la genèse post-éditoriale’ (Mahrer 2017, 22). As Mahrer notes, ‘La polarité entre texte et avant-texte se fait moins claire lorsqu’on prend en considération des documents qui ne sont génétique qu’a posteriori’ (18).

[4] For a further discussion of this model, see Dirk Van Hulle, ‘Modern Manuscripts’ in The Oxford Encyclopedia of Literary Theory (2019).

[5] The numbering is based on the HRC’s system of cataloguing in boxes and folders: SB stands for the Samuel Beckett collection, box 4, folder 2, item 1

[6] The original folder in which the typescript entered the UCSD Special Collections mentions explicitly: ‘KRAPP’S LAST TAPE (original ms) / (Typed by S.B.)’, but it has several features that occur nowhere else in Beckett’s own typescripts. Beckett’s typewriter had a separate key for the exclamation mark; here, the exclamation mark (for instance after Krapp’s first ‘Ah!’) is a combination of an apostrophe on top of a full stop. Beckett was a precise typist; here, the typescript shows uncharacteristically many typos. Finally, this is the only typescript that adds the author’s name (‘by Samuel Beckett’) underneath the title, which would have been very atypical for Beckett’s habitual typing practice.

[7] ‘“Love” said a wearish voice behind him, “turn round my young friend, face this way do, and tell me what you know of that disorder”’ (Beckett 2014, 15)

[8] I splashed past a little wearish old man,
scuttling along between a crutch and a stick’ (Beckett 2012, 6)

[9] ‘little wearish old man (Democritus)’ (Pilling 1999, 104). Cf. ‘Democritus (…) was a little wearish old man, very melancholy by nature, averse from company in his latter days, and much given to solitariness (…)’ (ed. Floyd Dell and Paul Jordan-Smith, New York: Tudor, 1948 [1621], p. 12).

[10] The photograph published in the catalogue is the lower half of page 4, which corresponds with the lower half of ET5, folio 4r (

[11] A few examples: Arthur Schnitzler digital:

Beckett Digital Manuscript Project:

Blake Archive:

Charles Harpur Critical Archive:

Digital Thoreau:

Emily Dickinson:  and


Jane Austen’s fiction manuscripts:  

James Joyce Digital Archive:

Melville’s books: 

Shelley-Godwin Archive:

[12] The BDMP uses a customized XML Schema based on the TEI P5 version 2.0.0. A full documentation of the project’s use of tags, attributes and suggested attribute values can be found at, along with a downloadable XML validation schema.

[13] A ‘namespace’ allows us to distinguish the ‘names’ of the elements, attributes, etc. in our computer language, from those used in other languages. Simply put, it would allow us to distinguish between a <table> element in a language where it describes a set of rows and columns in one language, from a <table> element in a language where it describes a piece of furniture in another. By declaring this namespace, we tell the XML parser (i.e. software that interprets our code) that we are using TEI-XML rather than another XML language (such as HTML for web pages, or ALTO-XML for OCR).

[14] When attributes are being discussed outside of the context of an actual XML document, they are referred to by means of an @ sign in front of the attribute. For more information about attributes, see below.

[16] The TEI guidelines suggest to also link to a concrete licence, e.g.: <licence target="">. Another, equally valid way to encode this information is:

<availability status="restricted">


          <p>Copyright © Samuel Beckett 1958</p>

          <p>Not to be distributed. Made available for educational purposes only.</p>



[17] For a document-oriented approach, the TEI advises to work with the <sourceDoc> tag (

[18] TEI stipulates ‘that the reference be without internal spaces, begin with a letter or underscore, and contain no characters other than letters, digits, hyphens, underscores, full stops, and the various combining and extender characters, as defined by the XML specification’ (

[19] As the example below will also demonstrate, the term ‘sentence’ is used rather loosely here, and based solely on the syntactic unit as it occurs in the base text. Indeed: in other versions of the text, the semantically relevant fragment may only be part of a sentence, or may comprise multiple sentences. This is why we encode these fragments with the TEI’s <seg> element (for ‘segment’) rather than with its <s> element (for ‘sentence’).

[20] If we would choose the documentary approach, but we still want to mark up the sentences, there is a danger of overlapping hierarchies:

            *<l><seg n="01">This is the first sentence.</seg> <seg n="02">This is the</l>

            <l>second sentence.</seg></l>

[21] Just like the unique XML ID for the author (e.g. to indicate the @hand), a unique XML ID also has to be created for the transcriber in the title statement (see above, Step 2).

[22] If the editor wishes to mark up this unit of deletion + addition as a substitution, they can make use of the <subst> element, but for many projects it may suffice to just imply the substitution by letting an <add> element follow immediately after a <del> element.

[23] For clarity’s sake, the attributes that are always the same have been omitted in this and following examples.

[24] This way of linking different parts of the same ‘segment’ developed organically in the early stages of the BDMP. For alternative pointing systems (e.g. by using the @next/@prev or @part attributes) and recommended practices, please consult Chapter 16 of the TEI Guidelines: Linking, Segmentation, and Alignment, and the technical specifications of the global ‘Linking’ attribute class

[25] The following describes the BDMP’s idiosyncratic way of linking text to image, which was devised and implemented before new elements, attributes, and best practices to make these connections were introduced in TEI P5 Release Version 2.0.0 of the TEI’s guidelines. At this point, converting the BDMP to meet the newer recommendations and ensuring backwards compatibility would be too time-consuming, and would risk breaking existing features. For new projects, we recommend reading Chapter 11 of the TEI Guidelines (“Representation of Primary Sources”), and to carefully select the method that suits your materials best there.

[26] Born-digital works and key stroke logging enable us to zoom in on an even smaller level (nanogenesis); and if we analyse that genesis of a work as part of an oeuvre (and a sous-oeuvre), the macrogenesis can be taken to another level (megagenesis).