Chapter 3. The Vertical Direction: Concept Hierarchies.

The previous chapter demonstrated a technique whereby a computer, with a very little knowledge of the way English works, can tell us that hungary, poland, romania, bulgaria and czechoslovakia are all related. So far so good, but this begs the question, what are they? The next step would be to have a technique for working out that they are all European countries. Something like the following interactive demo (now deprecated):
List of words (nouns):
Class Label Score
European country, European nation 3.500
Balkan country, Balkan nation, Balkans, Balkan state 1.250
country, state, land, nation 0.972
administrative district, administrative division, territorial division 0.458
district, territory 0.268
region 0.176
location 0.124
object, physical object 0.092
entity, something 0.072

Produced by the Infomap group, CSLI, Stanford University

Or if we want to know what cheese and its top five neighbours butter, milk, meat, bread, wine from the left hand cluster in Figure 2.6 have in common, we would like to be able to take these words and classify them as follows:

List of words (nouns):
Class Label Score
foodstuff, food product 2.500
dairy product 2.250
food, nutrient 0.944
substance, matter 0.472
object, physical object 0.334
beverage, drink, drinkable, potable 0.250
entity, something 0.248
money -0.250
combatant, battler, belligerent, fighter, scrapper -0.250
baked good, baked goods -0.250
dark red -0.250

Produced by the Infomap group, CSLI, Stanford University

Needless to say, there is an algorithm behind this class labelling trick, and it relies on a lot of careful work by many Princeton students over many years. The goal of this chapter is to describe this work and the mathematical ideas behind it. The algorithm itself is running here and you're welcome to try it out for yourself. Two of the most important characters in this story --- Aristotle and Darwin --- are not usually thought of as mathematicians at all, but nonetheless they described their ideas using mathematical models which were, if anything, far ahead of their time.

The idea is that concepts can be arranged into a hierarchy, a tree of meaning whose trunk and branches correspond to general concepts and whose twigs and leaves correspond to particular or specific concepts. We shall see that there are many examples of this sort of structure in common use, from the famous 'Tree of Life' to postal addresses and computer file systems. If the symmetric relationships of the previous chapter can be thought of as level or horizontal in character, the relationship between a child-node and a parent-node in a hierarchy (most clearly exemplified in the relationship between a species and its genus in the Tree of Life) can be thought of as a vertical relationship.


1. Phylogeny, or the Tree of Life

This section describes the way Aristotle and later Charles Darwin used the idea of a tree to describe the way a species inherits properties from its genus. Such a structure is called a taxonomy.

2. Directed Relationships

Such a collection of relationships can be represented as another kind of graph, but now the links in the graph are directed rather than symmetric. A good example of a graph with directed links is the world wide web, in which the link structure can be used to give an estimated measure of the popularity of different webpages. This is how Google's original PageRank algorithm worked.

3. Antisymmetric Relationships and Trees

The relationships in trees are not only directed, they are antisymmetric - if A is an ancestor of B then B cannot be an ancestor of A.

4. Representing Linguistic Meaning in Trees

The idea of using a tree to represent meaning in language goes back to Aristotle's Categories, a tradition that was followed by many philosophers and artists.

The Tree of Porphyry, one of the earliest examples of a concept hierarchy. From the ceiling fresco at Schussenried, Germany, by Franz Georg Herrmann (1757), photograph by Jeffrey Garrett (Northwestern University Library, October 2000).

Nowadays linguists use parse-trees to represent grammatical structure of sentences, and semantic knowledge bases contain taxonomies which are directly descended from Aristotle's work.

5. WordNet

The most widely available linguistic taxonomy is the Princeton WordNet.

Some of the uppermost nodes (major categories) in the WordNet noun taxonomy.

WordNet has separate taxonomies for nouns and verbs, and organizes modifiers (such as adjectives) into pairs of antonyms or opposites. The insight that substances and actions often lack 'contraries' or antonyms, but that qualities normally have contraries, is found explicitly in Aristotle's Categories

6. Finding class labels: Mapping collections of words into WordNet

Coming full circle, this section describes the way we can use WordNet to give a probably class label for a whole list of words.

The neighbours of Poland that we found in Chapter 2 are now classified as European Countries


The most important reading for this chapter is Aristotle's Categories.

The following paper describes and evaluates the class-labelling work more technically:

Dominic Widdows. Unsupervised methods for developing taxonomies by combining syntactic and statistical information. In Proceedings of HLT/NAACL 2003, Edmonton, Canada, June 2003, pages 276-283.

Up to Geometry and Meaning | Back to Chapter 2 | On to Chapter 4