Umlaut Latex Bibliography Format

In this chapter we will tackle matters related to input encoding, typesetting diacritics and special characters.

In the following document, we will refer to special characters for all symbols other than the lowercase letters a–z, uppercase letters A-Z, figures 0–9, and English punctuation marks.

Some languages usually need a dedicated input system to ease document writing. This is the case for Arabic, Chinese, Japanese, Korean and others. This specific matter will be tackled in Internationalization.

The rules for producing characters with diacritical marks, such as accents, differ somewhat depending whether you are in text mode, math mode, or the tabbing environment.

Input encoding[edit]

TeX uses ASCII by default. But 128 characters is not enough to support non-english languages. TeX has its own way to do that with commands for every diacritical marking (see Escaped codes). But if we want accents and other special characters to appear directly in the source file, we have to tell TeX that we want to use a different encoding.

There are several encodings available to LaTeX:

  • ASCII: the default. Only bare english characters are supported in the source file.
  • ISO-8859-1 (a.k.a. Latin 1): 8-bits encoding. It supports most characters for latin languages, but that's it.
  • UTF-8: a Unicode multi-byte encoding. Supports the complete Unicode specification.
  • Others...

In the following we will assume you want to use UTF-8.

There are some important steps to specify encoding.

  • Make sure your text editor decodes the file in UTF-8.
  • Make sure it saves your file in UTF-8. Most text editors do not make the distinction, but some do, such as Notepad++.
  • If you are working in a terminal, make sure it is set to support UTF-8 input and output. Some old Unix terminals may not support UTF-8. PuTTY is not set to use UTF-8 by default, you have to configure it.
  • Tell LaTeX that the source file is UTF-8 encoded.
\usepackage[utf8]{inputenc}

inputenc[1] package tells LaTeX what the text encoding format of your files is.

The inputenc package allows as well the user to change the encoding within the document by means of the command .

\usepackage[utf8]{inputenc}% ...% In this area% The UTF-8 encoding is specified.% ...\inputencoding{latin1}% ...% Here the text encoding is specified as ISO Latin-1.% ...\inputencoding{utf8}% Back to the UTF-8 encoding.% ...

Extending the support[edit]

The LaTeX support of UTF-8 is fairly specific: it includes only a limited range of unicode input characters. It only defines those symbols that are known to be available with the current font encoding. You might encounter a situation where using UTF-8 might result in error:

! Package inputenc Error: Unicode char \u8:ũ not set up for use with LaTeX.

This is due to the utf8 definition not necessarily having a mapping of all the character glyphs you are able to enter on your keyboard. Such characters are for example

ŷ Ŷ ũ Ũ ẽ Ẽ ĩ Ĩ

In such case, you may try need to use the utf8x option to define more character combinations. utf8x is not officially supported, but can be viable in some cases. However it might break up compatibility with some packages like csquotes.

Another possiblity is to stick with utf8 and to define the characters yourself. This is easy:

\DeclareUnicodeCharacter{'codepoint'}{'TeX sequence'}

where codepoint is the unicode codepoint of the desired character. TeX sequence is what to print when the character matching the codepoint is met. You may find codepoints on this site. Codepoints are easy to find on the web. Example:

\DeclareUnicodeCharacter{0177}{\^y}

Now inputting 'ŷ' will effectively print 'ŷ'.

Escaped codes[edit]

In addition to direct UTF-8 input, LaTeX supports the composition of special characters. This is convenient if your keyboard lacks some desired accents and other diacritics.

The following accents may be placed on letters. Although 'o' letter is used in most of the examples, the accents may be placed on any letter. Accents may even be placed above a "missing" letter; for example, produces a tilde over a blank space.

The following commands may be used only in paragraph (default) or LR (left-right) mode.

LaTeX commandSampleDescription
ògrave accent
óacute accent
ôcircumflex
öumlaut, trema or dieresis
őlong Hungarian umlaut (double acute)
õtilde
çcedilla
ąogonek
łbarred l (l with stroke)
ōmacron accent (a bar over the letter)
obar under the letter
ȯdot over the letter
dot under the letter
åring over the letter (for å there is also the special command )
ŏbreve over the letter
šcaron/háček ("v") over the letter
o͡o"tie" (inverted u) over the two letters
øslashed o (o with stroke)

To place a diacritic on top of an i or a j, its dot has to be removed. The dotless version of these letters is accomplished by typing and . For example:

  • should be used for i circumflex 'î';
  • should be used for i umlaut 'ï'.

If a document is to be written completely in a language that requires particular diacritics several times, then using the right configuration allows those characters to be written directly in the document. For example, to achieve easier coding of umlauts, the babel package can be configured as . This provides the short hand for . This is very useful if one needs to use some text accents in a label, since no backslash will be accepted otherwise.

More information regarding language configuration can be found in the Internationalization section.

Less than < and greater than >[edit]

The two symbols '<' and '>' are actually ASCII characters, but you may have noticed that they will print '¡' and '¿' respectively. This is a font encoding issue. If you want them to print their real symbol, you will have to use another font encoding such as T1, loaded with the fontenc package. See Fonts for more details on font encoding.

Alternatively, they can be printed with dedicated commands:

Euro currency symbol[edit]

When writing about money these days, you need the euro sign. The textcomp package features a command which gives you the euro symbol as supplied by your current text font. Depending on your chosen font this may be quite far from the official symbol.

An official version of the euro symbol is provided by eurosym. Load it in the preamble (optionally with the official option):

\usepackage[official]{eurosym}

then you can insert it with the command. Finally, if you want a euro symbol that matches with the current font style (e.g., bold, italics, etc.) you can use a different option:

\usepackage[gen]{eurosym}

again you can insert the euro symbol with .

Alternatively you can use the marvosym package which also provides the official euro symbol.

\usepackage{marvosym}% ...\EUR{}

Now that you have succeeded in printing a euro sign, you may want the '€' on your keyboard to actually print the euro sign as above. There is a simple method to do that. You must make sure you are using UTF-8 encoding along with a working or command.

\DeclareUnicodeCharacter{20AC}{\euro{}}% or\DeclareUnicodeCharacter{20AC}{\EUR{}}

Complete example:

\usepackage[utf8]{inputenc}\usepackage{marvosym}\DeclareUnicodeCharacter{20AC}{\EUR{}}

Degree symbol for temperature and math[edit]

The easiest way to print temperature and angle values is to use the command from the siunitx package, which works both in text and math mode:

\usepackage{amsmath}\usepackage{siunitx}%... A $\SI{45}{\degree}$ angle. It is \SI{17}{\degreeCelsius} outside.

For more information, see the documentation of the siunitx package.

A common mistake is to use the command. It will not print the correct character (though will). Use the textcomp package instead, which provides a command.

\usepackage{textcomp}%... A $45$\textdegree angle.

For temperature, you can use the same command or opt for the gensymb package and write

\usepackage{gensymb}\usepackage{textcomp}%... 17\,\celsius% best (with textcomp)

Some keyboard layouts feature the degree symbol, you can use it directly if you are using UTF-8 and textcomp. For better results (font quality) we recommend the use of an appropriate font, like lmodern:

\usepackage[utf8]{inputenc}\usepackage{lmodern}\usepackage{textcomp}% ... 17\,°C 17\,℃ % best

Other symbols[edit]

LaTeX has many symbols at its disposal. The majority of them are within the mathematical domain, and later chapters will cover how to get access to them. For the more common text symbols, use the following commands:

Not mentioned in above table, tilde (~) is used in LaTeX code to produce non-breakable space. To get printed tilde sign, either write or . And a visible space ␣ can be created with .

For some more interesting symbols, the Postscript ZapfDingbats font is available thanks to the pifont package. Add the declaration to your preamble: . Next, the command , will print the specified symbol. Here is a table of the available symbols:

.

In special environments[edit]

Math mode[edit]

Several of the above and some similar accents can also be produced in math mode. The following commands may be used only in math mode.

When applying accents to letters and , you can use and to keep the dots from interfering with the accents:

Tabbing environment[edit]

Some of the accent marks used in running text have other uses in the tabbing environment. In that case they can be created with the following command:

  • for an acute accent
  • for a grave accent
  • for a macron accent

Unicode keyboard input[edit]

Some operating systems provide a keyboard combination to input any Unicode code point, the so-called unicode compose key.

Many X applications (*BSD and GNU/Linux) support the combination. A 'u' symbol should appear. Type the code point and press or to actually print the character. Example:

<Ctrl+Shift+u> 20AC <space>

will print the euro character.

Desktop environments like GNOME and KDE may feature a customizable compose key for more memorizable sequences.

Xorg features advanced keyboard layouts with variants that let you enter a lot of characters easily with combination using the aprioriate modifier, like . It highly depends on the selected layout+variant, so we suggest you to play a bit with your keyboard, preceeding every key and dead key with the modifier.

In Windows, you can hold and type a to get a desired character. For example,

<Alt> + 0252

will print the German letter ü.

External links[edit]

Notes and References[edit]

If you use double-quotes, i.e., , to delimit the contents of a bibliographic field, you will find that writing

generates a BibTeX error, whereas

does not. I.e., BibTeX isn't quite smart enough on its own to distinguish between the two uses of the character and needs extra help.

In addition, contents of bibliographic fields -- certainly the and fields, but potentially other fields as well, including the , , and fields -- are frequently used to sort entries alphabetically.

How do BibTeX (and LaTeX) sort characters with Umlaute, diacritics, and other special features relative to the basic 26 characters of the Latin alphabet? How is one supposed to sort three authors named, say, , , and ? For some pretty sound reasons -- but which are way too ancient and obscure to go into any adequate level of detail here; to explore these reasons properly, it's crucial to have Appendix C of the TeXBook handy... -- a decision was made in the design of BibTeX to "purify" (the BibTeX function that does this job really is called !) the contents of various fields as follows (this method conforms, probably not surprisingly, to US and UK sorting criteria; it needn't be "correct" outside of English-speaking regions, as I will note below) for sorting purposes:

  • , , , etc are all made equivalent to ,
  • , , and are all made equivalent to ,
  • and become equivalent to and , respectively,
  • becomes equivalent to ,
  • becomes equivalent to ,
  • and so on for all other "accented" characters,
  • finally, any characters that do not fit into this scheme, including , are moved to the very end, i.e., after . This may seem arbitrary and ill-informed from today's vantage point, but back when BibTeX was created more than 20 years ago the only relevant character encoding and sorting system was ASCII.

As you can immediately appreciate, this "purification" step is greatly simplified and made more robust if the "accented" characters are all entered consistently in the manner suggested in the first part of this answer.

Turning to the earlier case of the three authors named , , and : How will they appear in a bibliography whose entries are sorted alphabetically by the authors' surnames? If Anna's last name is entered as , the three authors will end up being listed as - - . In contrast, if Anna's last name had been entered as , the sorting order would have been Hauser - Hill - Häuser. For most English-speaking readers, the second ordering will look completely wrong.

0 thoughts on “Umlaut Latex Bibliography Format

Leave a Reply

Your email address will not be published. Required fields are marked *