Samuel Beckett
Digital Manuscript Project

Brief Technical Documentation

This document has been updated for the publication of the second module: L'Innommable / The Unnamable (september 2013). The document in its first state (published on 24/06/2011) can be found here.
The electronic edition is encoded in XML (eXtensible Markup Language). The encoding design started from the P5 version of the Guidelines of the TEI (Text Encoding Initiative) and expands the DTD (Document Type Definition) with project-specific tags and attributes where needed.[1] The L'Innommable/The Unnamable module is based on version 2.3.0. of TEI P5[2], while the first module published in 2011 is based on version 1.0.0.[3] The tagset of the first module also incorporated some tags from a working document of the TEI SIG "Manuscripts".[4] A number of these proposed tags were introduced into TEI P5 version 2.0.0.[5], some in modified form. Because of these modifications and with the following modules in mind, we decided to comply with the P5 version that was current at the time the xml transcriptions of this module were finished (june 2013).[6]
The encoding is based on the definitions of crucial notions such as 'document', 'text', 'version', and 'work' by Peter Shillingsburg in his book Scholarly Editing in the Computer Age, notably the chapter entitled 'Ontology'. The edition is published in a Java framework. [7]
 

Metadata

The header contains metadata such as a title statement and publication statement, mentioning the coordinates of the Centre for Manuscript Genetics (University of Antwerp); a brief source description; and a profile description with information on the languages and the handwritings in the document.
 

Structural tags

<text>: This tag is used to indicate 'the actual order of words and punctuation as contained in any one physical form' (Shillingsburg 1996: 46). The physical form is paper and ink. As a physical vessel the document contains only one text, but it may contain more than one version of more than one work - a 'work' being 'the message or experience implied by the authoritative versions of a literary writing' (Shillingsburg 1996: 176). The archive catalogue number serves as unique id.

<div>: Used without attributes, this tag indicates a version, i.e. 'one specific form of the work - the one the author intended at some particular moment in time' (Shillingsburg 1996: 44). The writing layers are indicated by means of <del> (deletion) and <add> (addition) tags.

<div type="paralipomena">: Apart from versions, a document may also contain fragments of text (jottings, notes, reflections, try-out sentences, and so forth) which strictly speaking do not belong to a version of a work. These paralipomena are indicated by means of the tag <div type="paralipomena">.

<p>: Versions or paralipomena may consist of several paragraphs.

<seg>: Each paragraph usually consists of several sentences. When Beckett did not work with full sentences (e.g. in Not I / Pas moi) the segment consists of a few lines of text, i.e. a unit of text that can easily be compared to other versions.

<anchor type="subsentence"/>: In L'Innommable/The Unnamable, there are a number of very long sentences. Too long in fact, to allow for a sentence by sentence comparison. These sentences have been subdivided into two or more "subsentences" by means of anchors in the text.
 

Global attributes

xml:id
The xml:id of the <text> tag is the document's archive number. According to Peter Shillingsburg's definition, the variant forms of a work usually have the same name, but in some cases 'there will be disagreement over whether a variant form is in fact a variant version or a separate work' (176).

xml:lang
This attribute indicates the language in which the version is written.

n
The catalogue number is followed by the number of the sentence in the base text (see chapter "base texts"):
<seg n="MS-UoR-2934,[0127]">
In the case of a sentence that eventually did not make it into the base text, the number of the preceding sentence that did make it into the base text is followed by | and an extra number:
<seg n="MS-UoR-2934,[0127|001]">
The first number always consists of 4 digits: 0001 and so on; the second number, after the |, always consists of 3 digits. In the visualization, this extra sentence (or phrase) appears in bold, because it constitutes a deviation from the base text.

version
This attribute indicates the chronological order of the versions of a textual unit (section, paragraph, sentence).

In L'Innommable/The Unnamable, the chronology of versions largely corresponds to the chronology of the documents.[8] Only in cases where there is more than one version of the same sentence within one document and where the order of writing does not correspond to the documentary order, a version attribute has been added to the <seg> tag to encode the correct chronology.

In Stirrings Still/Soubresauts, the chronology is a lot more complex and version attributes have been added to all sections, paragraphs and sentences.
In the case of partial versions the version number is followed by a letter (e.g. typescript version 12 of Stirrings Still/Soubresauts contains a redraft of its last paragraph; this redrafted paragraph is indicated by the number 12a).

zone
<seg> tags have a zone attribute which holds the name(s) of the zone(s) on a page in the image / text feature that the sentence is a part of.

section (only applied in Stirrings Still/Soubresauts)
In its published form, Stirrings Still/Soubresauts consists of three sections, numbered 1, 2, and 3. Whenever a <div> can be identified as an early version of one of these three sections, it is followed by the section number 1, 2, or 3.
<div section="1">
<div section="2">
<div section="3">
A few blocks of text on the extant documents cannot be identified with any of the three sections, and yet they are more than just loose jottings or paralipomena. The author developed them in several versions, until he decided that this was a dead end. In the case of Stirrings Still there are 3 abandoned sections. They are also referred to as <div>s but the following section number is preceded by a zero:
<div section="01">
<div section="02">
<div section="03">

time (only applied in Stirrings Still/Soubresauts)
The 'version' attribute indicates the chronological sequence of versions of one single section, whereas the 'time' attribute indicates the chronological sequence of all the <div>s, irrespective of the sections.

chrono (only applied in Stirrings Still/Soubresauts)
Since there are 3 sections and 3 abandoned sections, there are 6 sections in all; the 'chrono' attribute indicates their chronological order, whether they made it into the published version or not.

trans / orig (only applied in Stirrings Still/Soubresauts and Comment dire/ what is the word)
These attributes are only relevant for bilingual works: if a version was translated by the author (sometimes already during the writing process) the source text is encoded with the attribute 'orig' and the target text with the attribute 'trans', both followed by the same number relating them to each other. For instance, the 18th version of section 1 is a translation of version 17. It is the third translation in the genetic dossier, hence the code orig="03" and trans="03".
<div section="1" chrono="4" version="17" xml:lang="EN" orig="03">
<div section="1" chrono="4" version="18" xml:lang="FR" trans="03">
The attributes are combined to allow readers to retrieve the transcripts from different perspectives:
The documents can be studied in the order of their catalogue numbers.
This option only requires the xml:id of the <text> tag:
<text xml:id="MS-UoR-2933-1">
A chronological approach rearranges the transcripts of the drafts in the order of their composition. This option is a combination of the section, time and ana attributes in the <div> tag:
<div section="1" chrono="4" version="17" xml:lang="EN" orig="03" time="45">
The rearrangement per language shows that Beckett often switched between French and English during the writing process.
<div section="1" chrono="4" version="17" xml:lang="EN" orig="03" time="45">
Translations (i.e. authorial translations) are distinguished from versions that were written directly in the target language:
<div section="1" chrono="4" version="17" xml:lang="EN" orig="03" time="45">
<div section="1" chrono="4" version="18" xml:lang="FR" trans="03" time="48">
The 'Compare versions' approach arranges the versions that did make it into the published text according to their position in the narrative structure. For this option the section and version-attribute suffice:
<div section="1" chrono="4" version="17" xml:lang="EN" orig="03" time="45">
The textual material can be rearranged from several perspectives (see menu) by combining different attributes:
Documents <text>   xml:id
Chronology <div>   section + time + chrono
Language <div>   section + version + xml:lang
Translations <div>   section + trans/orig
Compare versions <div>   section + version
<p>   section + version
<seg>   n + version
The numbering of the sections, paragraphs and sentences enables the user to adapt the size of the textual unit s/he wishes to compare.
<div section="1" chrono="4" version="17">
<p section="1.2" version="17">
<seg n="MS-UoR-2933-1,[0055]" version="17">
 

Textual Alterations

The most frequently occurring tags in the XML transcriptions are deletions and additions:

<del>: For each cancelled phrase the type of cancellation, the author of the cancellation, and the writing tools are indicated, as well as the person responsible for the transcription (the editor):
<del type="crossOut" hand="#SB" rend="black ink" resp="#DVH">...</del>

In the case of instant alterations (currente calamo) the type attribute value is 'instant correction'. Instant corrections are only marked if there is no doubt that the cancellation cannot have been introduced at a later stage: for instance in the sentence 'perhaps not again never to be heard again', 'not again' is marked as being followed by an instant correction; 'to be heard' is not, because the cancellation may have been introduced at a later stage.

<delSpan spanTo="#anchor"/>: For passages cancelled by Beckett or 'marked as used', three types can be distinguished: heavily crossed out, a diagonal line or a St. Andrew's cross.

<add>: For additions the place of the addition is also indicated:
<add place="marginleft" hand="#SB" rend="black ink" resp="#DVH">...</add>
The place indications used in the present edition are:
'marginleft,' 'marginright,' 'margintop,' 'marginbottom,' 'facingleaf,' 'inline,' 'supralinear,' 'infralinear,' 'overwritten.'
 

Open variants: alternative readings

Open variants have been marked up in this way:
when at last out again <seg type="alternative" xml:id="alt1">he knew not</seg>
<add place="above" type="alternative" xml:id="alt2">no knowing</add>
 

Transpositions

Transpositions (when the author moves blocks of text to a different position, using arrows, asterisks, numbers or lines) have been marked up in this way:
where he sits <seg type="transposition" xml:id="trans1">at his table</seg> <seg type="transposition" xml:id="trans2">head on hands</seg>.


All transpositions are declared in the header:

<listTranspose>
<transpose>
<ptr target="#trans2"/>
<ptr target="#trans1"/>
</transpose>
</listTranspose>
 

Metamarks

Passages or signs that, strictly speaking, do not belong to the version: paralipomena, dates and place names, numberings, stamps and 'metamarks', defined by the TEI as 'any kind of graphic or written signal within a document the function of which is to determine how it should be read rather than forming part of the actual content of the document'. In the BDMP these features are encoded as follows:

<metamark>: indicates metamarks, such as 'Stet' as a way of undoing a cancellation; or, for instance, two corresponding instances of the letter 'A' indicating where an addition is to be inserted.
<stamp>: indicates a stamp of the holding library.
<num>: indicates the page number as it is presented on the page. A 'type' attribute specifies whether these numbers were prenumbered in the notebook, written by Beckett, or added by an archivist.
<floatingText>: An archive number that was written on a document by the archivist, is seen as a floatingText, as defined by TEI.
<date>: to encode dates.
(only applied in Stirrings Still/Soubresauts)
 

Variants

Genetic Variants (rewritings)

Rewritings (variants between the 'top layer' of different versions in the genetic dossier) are marked by means of <rdg> tags.

Translation variants

Mismatches between the English and French are marked by means of <rdg> tags with a 'type' attribute value 'trans'. The absence of a word or word string that appears in the corresponding translation or original is indicated by means of a rend attribute, mentioning the absence:
<rdg type="trans" rend="absence"/>
In the BDMP this absence is visualized by means of a vertical bar.

Notes:

[1] "pb", "stemma", "time", "section", "trans", "orig", "textn", "over" and "chrono" have been added to the global attributes. The attributes "version" and "zone" have been added to the tags <text>, <div>, <p>, <sp>, <l>, <lg>, <stage> and <seg>. A <sub> tag has been added.
[6] The differences between the two versions of TEI P5 come down to these differences between L'Innommable / The Unnamable and Stirrings Still / Soubresauts: <metamark> vs. <ge:metamark>, <listTranspose> vs. <ge:transposeGrp>, <transpose> vs. <ge:transpose>, <handNotes> and <handNote> vs. <handList> and <hand>.
[7] The edition is published as a Cocoon webapplication inside the Apache Tomcat servlet container (http://tomcat.apache.org/). The search engine makes use of elasticsearch (https://www.elastic.co/).
[8] The chronology of the sentence versions relating to the first 24 pages of the first English typescript of The Unnamable differs from the chronology of the rest of the text. A more detailed analysis is made under "Chronology" in the L'Innommable / The Unnamable module.