The emroon Archive

Welcome to the new emroon Archive, a collection of morphologically and etymologically annotated Old Norse texts.

The Archive consists of two parts: the Corpus and the Lexicon.

Corpus

The Corpus consists of a collection of Old Norse primary sources:

  • Entire manuscripts,
  • Excerpts from manuscripts,
  • Manuscript fragments,
  • Charters (letters, diplomas),
  • Runic inscriptions.

All sources are available in facsimile and diplomatic transcription as well as in normalization.

Lexicon

The Lexicon contains all lexical data needed for the annotation of the Corpus. It consists of several layers:

  • The Lexicon proper, a list of Old Norse lemmata,
  • Morphological forms, assigned an analysis and a lemma,
  • Morphemes, the building blocks of individual morphological forms,
  • Diaphonemes (formerly sound positions), abstract phono-referential units that differ from phonemes in not reflecting diachronic and diatopic variation.

Perspective

The new emroon Archive will eventually replace all the functionality of the both the legacy emroon page and MenotaBlitz.

Technology

The new emroon Archive is a Next.js application, run in a Linux Alpine Docker container on a Linux Ubuntu server on NREC. The front-end is exposed through an NGINX reverse proxy server. Data is stored in a PostGreSQL database hosted on the same server.

All data exists primarily in a series of XML files. These XML files are converted to JSON and pre-processed and fed into the database by the emroon Compiler (a different Node.js application). This workflow is triggered manually by the developer.

Sources in Corpus

24

Tokens in Corpus

89129

Lemmata in Lexicon

10940

Forms in Lexicon

38762

Morphemes in Lexicon

3980