LaTeX Group

The LaTeX Group is for people interested in writing linguistic articles and books using TeX and LaTeX. We meet during the term from 11:00–12:00 on every other Friday.

TeX and LaTeX are software systems designed for document design and publishing. There are a growing number of linguists using TeX and LaTeX for their document authoring needs, and there are a number of good resources for using TeX and LaTeX to write linguistics articles and books, e.g. LaTeX4Ling and the Ling-TeX mailing list. The XeTeX extension allows for the use of Unicode and arbitrary fonts, making it easy to incorporate text in various orthographies as well as simplifying the use of linguistic symbols and IPA.

Background

TeX was originally developed at Stanford in the late 1970s by computer science professor Donald Knuth on the WAITS operating system that ran on room-filling PDP-10 computers. As such, TeX predates nearly all computers in use today, and it is consequently quite different from most modern computer software. Getting used to using it requires thinking somewhat differently from the point-and-click world. The learning curve is relatively steep, but the rewards from using TeX and LaTeX repay the learning effort many times over.

Levels of TeX

TeX is, like language, not a monolithic entity but a complicated system made up of many interoperating parts. Similar to the “great chain of being” in linguistics of phonetics > phonology > morphology > syntax > semantics > pragmatics, there is a sort of notional hierarchy in TeX which is useful to keep in mind. This is outlined by the Levels of TeX page which explains the different parts of TeX distributions that work together to process documents.

Choosing a distribution

There are distributions of TeX available for Windows, Mac OS X, and Linux among other operating systems. Each has its own requirements and each works somewhat differently, but in general they are similar except for some installation and configuration differences, as well as their included non-TeX applications (editors, bibliography managers, etc.). Documents produced using one distribution will almost always work exactly the same in other distributions, although this does depend on how much a given document relies on obscure packages and fonts.

TeX Live is a cross-platform distribution of TeX and LaTeX which is available for Windows, Linux, Mac OS X, and some other Unix-based operating systems. However there are more specialized distributions available for specific operating systems. TeX Live serves as a basis for the MacTeX and Linux distributions, but MikTeX is somewhat independent from TeX Live.

Mac OS X

TeX works very well on the Mac, and this combination is a favourite of many linguists. Mac users should install MacTeX which is a distribution of TeX Live customized specifically for Mac OS X. It includes both the cross-platform TeXworks editor as well as the Mac-specific TeXShop editor that originally inspired TeXworks. It also includes BibDesk, a very useful application for managing bibliographies for BibTeX that can also organize PDFs and link to online article providers.

The TeX Live Utility is a Mac-only graphical user interface to the ‘tlmgr’ program for installing, updating, and removing packages from TeX Live installations. This program makes package management on the Mac much easier.

MacTeX users should read the TeXnical Help page for MacTeX and consider joining the MacOSX-TeX mailing list.

Windows

TeX and LaTeX are not quite as well developed on Windows as they are on Mac OS X and on Linux. This is because TeX and LaTeX have lived on Unix systems since the mid-1980s, and have consequently evolved alongside the development of Unix. In contrast, TeX and LaTeX only came to Windows about a decade ago.

MiKTeX is the most popular TeX distribution for Windows. It has a very good package management system that can automatically download and install packages when they are first used. Current versions of MikTeX support Windows XP (SP3), Windows Vista (SP2), and Windows 7; older Windows systems (95, 98, Me, 2000, etc.) are not supported. MikTeX supplies the TeXworks editor as well.

Linux

TeX and LaTeX are at home on Linux, since Unix-like operating systems have been the home of TeX for the last few decades. Linux distributions usually have their own adaptations of TeX Live which are installed using the Linux distro’s package management system. The alternative is to install TeX Live directly. It is important to use a TeX Live installation rather than an older teTeX installation. The teTeX system is archaic and unsupported, but it is confusingly still provided by a few Linux distributions.

Linux users can use TeXworks for editing documents, as well as a myriad of text editors. GNU Emacs and XEmacs both support AUCTeX for editing TeX and LaTeX documents. Vim also has an extensive set of tools for TeX and LaTeX editing in VIM-LaTeX.

Output

The majority of TeX and LaTeX users produce PDF output using the pdfTeX extension and its offspring. The traditional mechanism was to produce a DVI file (DeVice Independent file) from TeX or LaTeX and then convert that to PostScript using the dvips conversion program. Later the process shifted to converting DVI files to PDF with the dvipdfm(x) programs, but pdfTeX has almost completely replaced the use of DVI files today.

There are many TeX and LaTeX packages that provide support for various PDF features such as hyperlinking, page transitions, audio and video embedding, and so forth. The most common requirements are to include hyperlinks for figures and references and to produce PDF bookmarks for section headings (the “table of contents” links displayed on the side of a PDF in Adobe Acrobat and other viewers). These are typically produced using the hyperref package which has very thorough documentation.

Packages

TeX and LaTeX packages, software, and documentation are all available in a centralized repository called CTAN, the Comprehensive TeX Archive Network. This repository is very large and constantly growing, so it can be somewhat overwhelming at first. The TeX Catalogue Online provides a set of guides to all the packages available on CTAN, arranged by topic, name (“alpha”), and directory hierarchy. Each TeX distribution includes a subset of the packages available on CTAN, and each provides a mechanism for updating packages and adding new ones.

The following lists give a selection of packages and macros that are particularly useful for linguists. Most of these are included in TeX installations like MacTeX and MikTeX, in which case the package’s home is on CTAN, though an independent website may also exist. Exceptions are John Frampton’s packages, the OTtablx package, and avm.sty. Most packages available on CTAN are already installed in the typical TeX installation, so they are immediately available for use.

General packages

  • booktabs — Linguists use tables extensively to present data. The booktabs package by Simon Fear provides a means for typesetting high-quality tables as seen in journals and books. It supplants the default LaTeX tables which are rather ugly.
  • multicol — This package supports table cells that span multiple columns, with options to adjust vertical and horizontal alignment, column width, and so forth.
  • multirow — A companion to multicol which supports table cells that span multiple rows.
  • longtable — The longtable package is one of several which support the typesetting of tables that are longer than a single page. This package ensures that column widths are the same across pages, and supports different headers and footers for different page types (e.g. “continued on next page” for intermediate pages).
  • graphicx — An indispensable package for including arbitrary graphics such as spectrograms, ultrasound images, and photographs. Allows for rotating, flipping, and scaling of graphics as well as many other operations.
  • comprehensive symbols — A comprehensive list of symbols available through a huge number of different packages.

Linguistics packages

  • ExPex, pst-jTree, and pst-asr — John Frampton maintains these three packages for numbered and glossed examples, linguistic trees, and autosegmental representations, respectively.
  • gb4e — The most popular package for glossed and numbered example sentences.
  • linguex — Wolfgang Sternefeld’s package for glossed and numbered example sentences. Includes ps-trees for drawing syntax trees, based on Emma Pease’s tree-dvips (see below).
  • xytree — A package by Koaunghi Un which draws syntactic trees using the xy-pic drawing package.
  • qtree — Alexis Dmitriadis and Jeffrey Siskind’s package for drawing syntax trees. Not very customizable, but has a simple syntax which is easy to learn.
  • pst-qtree — David Chiang and Daniel Gildea’s adaptation of qtree to use the PSTricks drawing system.
  • tikz-qtree — David Chiang went on to adapt qtree to work with the PGF-TikZ drawing system.
  • tree-dvips — Emma Pease’s syntax tree drawing package.
  • rrgtrees — David Gardner’s package for drawing Role and Reference Grammar (RRG) tree diagrams.
  • avm.sty — Documentation on using Chris Manning’s package for producing attribute-value matrices as used in e.g. HPSG and LFG. This package is useful for presenting any kind of complex data in matrices, for example lexical structure, not just for these particular theories.
  • tipa — A package from Fukui Rei for typesetting IPA and other phonetic symbols. This package is not necessary if Xe(La)TeX is used, in which case IPA characters (in Unicode) can be simply entered as-is and then an appropriate font used for the text which includes the relevant IPA characters. TeXnicians who are still using plain (La)TeX will still need to use tipa. Interestingly, tipa and its predecessor wsuipa had some influence on how IPA was standardized in Unicode.
  • OTtablx — There are several packages floating around which support the typesetting of optimality theoretic tableaux, but this package by Nathan Sanders is particularly good. It is not yet on CTAN, so it must be obtained from the author’s website.

Pronunciation

TeX was intended to be pronounced as [tɛx] by Donald Knuth, based on the Greek word τέχνη /ˈtɛxni/ (Anc. Grk. /tékʰneː/) meaning ‘art’ or ‘craft’. English-speaking linguists variously pronounce it as [tɛx] or [tɛk], and *[tɛks] is typically discouraged. Interestingly, *[tɛχ] doesn’t seem to occur among linguists despite the temptation to interpret _X_ or _χ_ as [χ].

The -TeX root of LaTeX is pronounced the same as TeX above. The prefix La-, from author Leslie Lamport’s last name, is variously pronounced as [lei̯], [la], or occasionally [læ]. Stress is usually initial.

The author of XeTeX, Jonathan Kew, says that he intended the Xe- prefix to be pronounced as [zi], although some people pronounce it as [ksi] despite this onset not fitting English phonotactics. As with LaTeX, stress is usually initial. The name XeLaTeX is compositional, thus e.g. [zilei̯tɛx], with either initial or penultimate stress.