Samuel Beckett
Digital Manuscript Project

Brief Technical Documentation

This is the Brief Technical Documentation as it was published on 24/06/2011. The document has since been updated. Click here to go back to the updated version.
The electronic edition is encoded in XML (eXtensible Markup Language). The encoding design started from the P5 version of the Guidelines of the TEI (Text Encoding Initiative) and expands the DTD (Document Type Definition) with project-specific tags and attributes where needed. The added subset for the encoding of genetic editions by the TEI Special Interest Group focusing on modern manuscripts has also been taken into account and several recommendations have been incorporated. The encoding is based on the definitions of crucial notions such as 'document', 'text', 'version', and 'work' by Peter Shillingsburg in his book Scholarly Editing in the Computer Age, notably the chapter entitled 'Ontology'. The edition is published in a Java framework. [1]


The header contains metadata such as a title statement and publication statement, mentioning the coordinates of the Centre for Manuscript Genetics (University of Antwerp); a brief source description; and a profile description with information on the languages and the handwritings in the document.


<text>: This tag is used to indicate 'the actual order of words and punctuation as contained in any one physical form' (Shillingsburg 1996: 46). The physical form is - in the case of Beckett's last works - paper and ink. As a physical vessel the document contains only one text, but it may contain more than one version of more than one work - a 'work' being 'the message or experience implied by the authoritative versions of a literary writing' (Shillingsburg 1996: 176). The archive catalogue number serves as unique id.

<div>: In the edition, this tag is used to indicate a version, i.e. 'one specific form of the work - the one the author intended at some particular moment in time' (Shillingsburg 1996: 44). The writing layers are indicated by means of <del> (deletion) and <add> (addition) tags.

<div type="paralipomena">: Apart from versions, a document may also contain fragments of text (jottings, notes, reflections, try-out sentences, and so forth) which strictly speaking do not belong to a version of a work. These paralipomena are indicated by means of the tag <div type="paralipomena">

<p>: Sections or paralipomena may consist of several paragraphs.

<seg>: Each paragraph usually consists of several sentences. When Beckett did not work with full sentences (e.g. in Not I / Pas moi) the segment consists of a few lines of text, i.e. a unit of text that can easily be compared to other versions.


The xml:id of the <text> tag is the document's archive number. According to Peter Shillingsburg's definition, the variant forms of a work usually have the same name, but in some cases 'there will be disagreement over whether a variant form is in fact a variant version or a separate work' (176).
In its published form, the work consists of three sections, numbered 1, 2, and 3. Whenever a <div> can be identified as an early version of one of these three sections, it is followed by the section number 1, 2, or 3.
<div section="1">
<div section="2">
<div section="3">
A few blocks of text on the extant documents cannot be identified with any of the three sections, and yet they are more than just loose jottings or paralipomena. The author developed them in several versions, until he decided that this was a dead end. In the case of Stirrings Still there are 3 abandoned sections. They are also referred to as <div>s but the following section number is preceded by a zero:
<div section="01">
<div section="02">
<div section="03">

The catalogue number is followed by the number of the sentence in the base text (see chapter "base texts"):
<seg n="MS-UoR-2934,[0127]">
In the case of a sentence that eventually did not make it into the base text, the number of the preceding sentence that did make it into the base text is followed by | and an extra number:
<seg n="MS-UoR-2934,[0127|001]">
The first number always consists of 4 digits: 0001 and so on; the second number, after the |, always consists of 3 digits. In the visualization, this extra sentence (or phrase) appears in bold, because it constitutes a deviation from the base text.

The chronological order of the versions of each section is indicated by means of a 'version' attribute: e.g. the seventeenth version of the first section:
<div section="01" version="17">
In the case of partial versions the version number is followed by a letter (e.g. typescript version 12 contains a redraft of its last paragraph; this redrafted paragraph is indicated by the number 12a).

The 'version' attribute indicates the chronological sequence of versions of one single section, whereas the 'time' attribute indicates the chronological sequence of all the <div>s, irrespective of the sections.

Since there are 3 sections and 3 abandoned sections, there are 6 sections in all; the 'chrono' attribute indicates their chronological order, whether they made it into the published version or not.

This attribute indicates the language in which the version is written.

trans / orig
These attributes are only relevant for bilingual works: if a version was translated by the author (sometimes already during the writing process) the source text is encoded with the attribute 'orig' and the target text with the attribute 'trans', both followed by the same number relating them to each other. For instance, the 18th version of section 1 is a translation of version 17. It is the third translation in the genetic dossier, hence the code orig="03" and trans="03".
<div section="1" chrono="4" version="17" xml:lang="EN" orig="03">
<div section="1" chrono="4" version="18" xml:lang="FR" trans="03">
The attributes are combined to allow readers to retrieve the transcripts from different perspectives:
The documents can be studied in the order of their catalogue numbers.
This option only requires the xml:id of the <text> tag:
<text xml:id="MS-UoR-2933-1">
A chronological approach rearranges the transcripts of the drafts in the order of their composition. This option is a combination of the section, time and ana attributes in the <div> tag:
<div section="1" chrono="4" version="17" xml:lang="EN" orig="03" time="45">
The rearrangement per language shows that Beckett often switched between French and English during the writing process.
<div section="1" chrono="4" version="17" xml:lang="EN" orig="03" time="45">
Translations (i.e. authorial translations) are distinguished from versions that were written directly in the target language:
<div section="1" chrono="4" version="17" xml:lang="EN" orig="03" time="45">
<div section="1" chrono="4" version="18" xml:lang="FR" trans="03" time="48">
The 'Compare versions' approach arranges the versions that did make it into the published text according to their position in the narrative structure. For this option the section and version-attribute suffice:
<div section="1" chrono="4" version="17" xml:lang="EN" orig="03" time="45">
The textual material can be rearranged from several perspectives (see menu) by combining different attributes:
Documents <text> xml:id
Chronology <div> section + time + chrono
Language <div> section + version + xml:lang
Translations <div> section + trans/orig
Compare versions <div> section + version
<p> section + version
<seg> n + version
The numbering of the sections, paragraphs and sentences enables the user to adapt the size of the textual unit s/he wishes to compare.
<div section="1" chrono="4" version="17">
<p section="1.2" version="17">
<seg n="MS-UoR-2933-1,[0055]" version="17">

Textual Alterations

The most frequently occurring tags in the XML transcriptions are deletions and additions:

<del>: For each cancelled phrase the type of cancellation, the author of the cancellation, and the writing tools are indicated, as well as the person responsible for the transcription (the editor):
<del type="crossOut" hand="#SB" rend="blackink" resp="#DVH">...</del>

In the case of instant alterations (currente calamo) the type attribute value is 'instant correction'. Instant corrections are only marked if there is no doubt that the cancellation cannot have been introduced at a later stage: for instance in the sentence 'perhaps not again never to be heard again', 'not again' is marked as being followed by an instant correction; 'to be heard' is not, because the cancellation may have been introduced at a later stage.

<delSpan spanTo="#anchor"/>: For passages cancelled by Beckett or 'marked as used', three types can be distinguished: heavily crossed out, a diagonal line or a St. Andrew's cross.

<add>: For additions the place of the addition is also indicated:
<add place="marginleft" hand="#SB" rend="blackink" resp="#DVH">...</add>
The place indications used in the present edition are:
'marginleft,' 'marginright,' 'margintop,' 'marginbottom,' 'facingleaf,' 'inline,' 'supralinear,' 'infralinear,' 'overwritten.'

Open variants: alternative readings

The TEI 'Genetic Editions' SIG proposes to mark open variants as 'alternative readings':
when at last out again <seg type="alternative" xml:id="alt1">he knew not</seg>
<add place="above" type="alternative" xml:id="alt2">no knowing</add>
The SIG proposal suggests an <alt/> tag in which more weight is attributed to one of the two readings (@weights="0 1"). Because this implies a high degree of interpretation (sometimes involving the retroactive projection of a subsequent version onto a previous one), the BDMP does not apply this procedure.


To mark a transposition (when the author moves blocks of text to a different position, using arrows, asterisks, numbers or lines), the BDMP follows the SIG recommendations:
where he sits <seg type="transposition" xml:id="trans1">at his table</seg> <seg type="transposition" xml:id="trans2">head on hands</seg>.

<ptr target="#trans2"/>
<ptr target="#trans1"/>


Passages or signs that, strictly speaking, do not belong to the version: paralipomena, dates and place names, numberings, stamps and 'metamarks', defined by the TEI SIG as 'symbols introduced by the writer in a document expressly for the purpose of indicating how the text is to be read'. In the BDMP these features are encoded as follows:

<ge:metamark>: indicates metamarks, such as 'Stet' as a way of undoing a cancellation; or, for instance, two corresponding instances of the letter 'A' indicating where an addition is to be inserted.
<stamp>: indicates a stamp of the holding library.
<num>: indicates the page number as it is presented on the page. A 'type' attribute specifies whether these numbers were prenumbered in the notebook, written by Beckett, or added by an archivist.
<floatingText>: indicates the archive number that was written on the documents by the archivist, not by Beckett.
<date>: The TEI SIG 'consider[s] as metamarks dates introduced to mark the beginning of a manuscript or revision, but not forming part of it'. However, since pieces of text that are part of the manuscript but not part of the version are generally encoded as paralipomena (<div type="paralipomena">)and since metamarks are instructions by the author on how the text is to be pieced together, the BDMP uses a separate <date> tag.


Genetic Variants (rewritings)

Rewritings (variants between the 'top layer' of different versions in the genetic dossier) are marked by means of <rdg> tags.

Translation variants

Mismatches between the English and French are marked by means of <rdg> tags with a 'type' attribute value 'trans'. The absence of a word or word string that appears in the corresponding translation or original is indicated by means of a rend attribute, mentioning the absence:
<rdg type="trans" rend="absence"/>
In the BDMP this absence is visualized by means of a vertical bar.


[1] The edition is published as a Cocoon webapplication inside the Apache Tomcat servlet container ( The search engine makes use of the eXist XML retrieval engine (