Samuel Beckett
Digital Manuscript Project

Brief Technical Documentation

This document has been updated for the upcoming publication of the eight module: Play / Comédie (December 2021). The document in its first state (published on 24/06/2011) can be found here. The document in its second state (published in September 2013) can be found here.
The electronic edition is encoded in XML (eXtensible Markup Language). The encoding design started from the P5 version of the Guidelines of the TEI (Text Encoding Initiative) and expands the DTD (Document Type Definition) with project-specific tags and attributes where needed.[1] The L'Innommable/The Unnamable module, as well as all modules published since, are based on version 2.3.0. of TEI P5[2], while the first module published in 2011 is based on version 1.0.0.[3] The tagset of the first module also incorporated some tags from a working document of the TEI SIG "Manuscripts".[4] A number of these proposed tags were introduced into TEI P5 version 2.0.0.[5], some in modified form. Because of these modifications and with the following modules in mind, we decided to comply with the P5 version that was current at the time the xml transcriptions of this module were finished (june 2013).[6]
The encoding is based on the definitions of crucial notions such as 'document', 'text', 'version', and 'work' by Peter Shillingsburg in his book Scholarly Editing in the Computer Age, notably the chapter entitled 'Ontology'.
In 2021 the technical architecture behind the edition was migrated from its previous publication framework[7] into eXist-db, a NoSQL document database and application platform.


The header contains metadata such as a title statement and publication statement, mentioning the coordinates of the Centre for Manuscript Genetics (University of Antwerp); a brief source description; and a profile description with information on the languages and the handwritings in the document.

Structural tags

<text>: This tag is used to indicate 'the actual order of words and punctuation as contained in any one physical form' (Shillingsburg 1996: 46). The physical form is paper and ink. As a physical vessel the document contains only one text, but it may contain more than one version of more than one work - a 'work' being 'the message or experience implied by the authoritative versions of a literary writing' (Shillingsburg 1996: 176). The archive catalogue number serves as unique id.

<div type="notebook">: As a first child the <body> tag will have a <div> element that declares the type of document that is being transcribed. Values are "notebook", "typescript" and "looseleaves". In the case of typescripts and loose leaves, documents can sometimes have writing on verso pages. If a transcription contains material on verso pages, this top-level <div> tag will have an attribute 'subtype="withversos"'.

<div type="page" rend="recto">: Since BDMP module 4 we have encoded all documents in a series of <div type="page"> elements.

<div type="paralipomena">: Apart from versions, a document may also contain fragments of text (jottings, notes, reflections, try-out sentences, and so forth) which strictly speaking do not belong to a version of a work. These paralipomena are indicated by means of the tag <div type="paralipomena">.

<div>: Used without attributes, this tag indicates a version, i.e. 'one specific form of the work - the one the author intended at some particular moment in time' (Shillingsburg 1996: 44). The writing layers are indicated by means of <del> (deletion) and <add> (addition) tags.

<p>: Versions or paralipomena may consist of several paragraphs.

<seg>: Each paragraph usually consists of several sentences. When Beckett did not work with full sentences (e.g. in Not I / Pas moi) the segment consists of a few lines of text, i.e. a unit of text that can easily be compared to other versions.

Global attributes

The xml:id of the <text> tag is the document's archive number. According to Peter Shillingsburg's definition, the variant forms of a work usually have the same name, but in some cases 'there will be disagreement over whether a variant form is in fact a variant version or a separate work' (176).

This attribute indicates the language in which the version is written.

The catalogue number is followed by the number of the sentence in the base text (see chapter "base texts"):
<seg n="MS-UoR-2934,[0127]">
In the case of a sentence that eventually did not make it into the base text, the number of the preceding sentence that did make it into the base text is followed by | and an extra number:
<seg n="MS-UoR-2934,[0127|001]">
The first number always consists of 4 digits: 0001 and so on; the second number, after the |, always consists of 3 digits. In the visualization, this extra sentence (or phrase) appears in bold, because it constitutes a deviation from the base text.

This attribute indicates the chronological order of the versions of a textual unit (section, paragraph, sentence).

In L'Innommable/The Unnamable, the chronology of versions largely corresponds to the chronology of the documents.[8] Only in cases where there is more than one version of the same sentence within one document and where the order of writing does not correspond to the documentary order, a version attribute has been added to the <seg> tag to encode the correct chronology.

In Stirrings Still/Soubresauts, the chronology is a lot more complex and version attributes have been added to all sections, paragraphs and sentences.
In the case of partial versions the version number is followed by a letter (e.g. typescript version 12 of Stirrings Still/Soubresauts contains a redraft of its last paragraph; this redrafted paragraph is indicated by the number 12a).

<seg> tags have a zone attribute which holds the name(s) of the zone(s) on a page in the image / text feature that the sentence is a part of.

Textual Alterations

The most frequently occurring tags in the XML transcriptions are deletions and additions:

<del>: For each cancelled phrase the type of cancellation, the author of the cancellation, and the writing tools are indicated, as well as the person responsible for the transcription (the editor):
<del type="crossOut" hand="#SB" rend="black ink" resp="#DVH">...</del>

In the case of instant alterations (currente calamo) the type attribute value is 'instantcorrection'. Instant corrections are only marked if there is no doubt that the cancellation cannot have been introduced at a later stage: for instance in the sentence 'perhaps not again never to be heard again', 'not again' is marked as being followed by an instant correction; 'to be heard' is not, because the cancellation may have been introduced at a later stage.

<delSpan spanTo="#anchor"/>: For passages cancelled by Beckett or 'marked as used', three types can be distinguished: heavily crossed out, a diagonal line or a St. Andrew's cross.

<add>: For additions the place of the addition is also indicated:
<add place="marginleft" hand="#SB" rend="black ink" resp="#DVH">...</add>
The place indications used in the present edition are:
'marginleft,' 'marginright,' 'margintop,' 'marginbottom,' 'facingleaf,' 'inline,' 'supralinear,' 'infralinear,' 'overwritten.'

Open variants: alternative readings

Open variants have been marked up in this way:
when at last out again <seg type="alternative" xml:id="alt1">he knew not</seg>
<add place="above" type="alternative" xml:id="alt2">no knowing</add>


Transpositions (when the author moves blocks of text to a different position, using arrows, asterisks, numbers or lines) have been marked up in this way:
where he sits <seg type="transposition" xml:id="trans1">at his table</seg> <seg type="transposition" xml:id="trans2">head on hands</seg>.

All transpositions are declared in the header:

<ptr target="#trans2"/>
<ptr target="#trans1"/>


Passages or signs that, strictly speaking, do not belong to the version: paralipomena, dates and place names, numberings, stamps and 'metamarks', defined by the TEI as 'any kind of graphic or written signal within a document the function of which is to determine how it should be read rather than forming part of the actual content of the document'. In the BDMP these features are encoded as follows:

<metamark>: indicates metamarks, such as 'Stet' as a way of undoing a cancellation; or, for instance, two corresponding instances of the letter 'A' indicating where an addition is to be inserted.
<stamp>: indicates a stamp of the holding library.
<num>: indicates the page number as it is presented on the page. A 'type' attribute specifies whether these numbers were prenumbered in the notebook, written by Beckett, or added by an archivist.
<floatingText>: An archive number that was written on a document by the archivist, is seen as a floatingText, as defined by TEI.
<date>: to encode dates.


[1] "pb", "stemma", "time", "section", "trans", "orig", "textn", "over" and "chrono" have been added to the global attributes. The attributes "version" and "zone" have been added to the tags <text>, <div>, <p>, <sp>, <l>, <lg>, <stage> and <seg>. A <sub> tag has been added.
[6] The differences between the two versions of TEI P5 come down to these differences between L'Innommable / The Unnamable and Stirrings Still / Soubresauts: <metamark> vs. <ge:metamark>, <listTranspose> vs. <ge:transposeGrp>, <transpose> vs. <ge:transpose>, <handNotes> and <handNote> vs. <handList> and <hand>.
[7] From 2011 to 2021 the edition was published as a Cocoon webapplication inside the Apache Tomcat servlet container ( The search engine made use of elasticsearch (
[8] The chronology of the sentence versions relating to the first 24 pages of the first English typescript of The Unnamable differs from the chronology of the rest of the text. A more detailed analysis is made under "Chronology" in the L'Innommable / The Unnamable module.