Case conversion for monotonic Greek

The basic rules for case conversion of monotonic Greek are simple:

1. all accented letters lose the tonos; and
2. all instances of the dieresis remain.

Difficulties arise with round-trip conversion (reversing to lower-case), capital-to-lower-case conversion, the handling of proper names with accented initial letters, and diphthongs or double vowels. We will look at each case below.

According to the first rule, the conversions are:

ά > Α,  έ > Ε,  ή > Η,  ί > Ι,  ύ > Υ,  ό > Ο,  ώ > Ω

(When α ε ο are not parts of a dipthong with ι! See below for these cases.)

Here are some examples:

πάμε τώρα >  ΠΑΜΕ ΤΩΡΑ

έλα πίσω > ΕΛΑ ΠΙΣΩ

φύγε μόνο εσύ
 > ΦΥΓΕ ΜΟΝΟ ΕΣΥ

According to the second rule, the conversions will be:

ϊ > Ϊ,  ϋ > Ϋ,  ΐ > Ϊ,  ΰ > Ϋ

Here are some examples:

φάγαμε καϊμάκι > ΦΑΓΑΜΕ ΚΑΪΜΑΚΙ

φύσηξε ο μαΐστρος > ΦΥΣΗΞΕ Ο ΜΑΪΣΤΡΟΣ

έχει αϋπνία > ΕΧΕΙ ΑΫΠΝΙΑ

επιτυχής εξαΰλωση >
 ΕΠΙΤΥΧΗΣ ΕΞΑΫΛΩΣΗ

Note that both ϊ and ΐ convert to Ϊ. Similarly, both ϋ and ΰ convert to Ϋ. This means that, unless you have access to a dictionary, you do not know if a capital Ϊ or Ϋ should convert to a lowercase ϊ or ΐ (and ϋ or ΰ respectively). The problem is solved if the capital forms arise from conversion of a correctly keyed in lowercase ΐ or ΰ, but not if the capitals have been keyed in directly: in those cases, it is highly likely that the user typed the standard keystroke for a Ϊ or Ϋ.

This assumption is supported by standard keyboard layouts for Greek, which are based on typewriter conventions, and do not anticipate case conversion issues. Not surprisingly, the basic Unicode Greek set has only one form of  Ϊ or Ϋ at U+03AA and U+03D4 respectively. The extended Unicode Greek set also does not provide “double” Ϊ or Ϋ.[1. It is arguable that this is a semantic distinction, and not a stylistic one, but it is unlikely that Unicode would be amended in this respect. Therefore, the distinction must be maintained within the font file, and supported by the typesetting environment.]

Roundtrip conversion is a constant issue in Greek; it applies throughout this discussion.

 

Initial accented letters

The easiest way to understand case conversion complications in Greek is to think of accented letters having three cases, instead of two:

– lowercase accented leters, where the accented letter and its neighbours are also lowercase;
– initial-case accented letters, where the letter with an accent is a capital but its neighbours are lowercase; and
– uppercase, where only the dieresis is visible, and the accented letter and its neighbours are all capitals.

Here is an example:

Έλα Άννα! Ήρθε η ώρα. 
Ύστερα θα είναι αργά.
Όταν πάμε σπίτι θα καλέσουμε την Ήρα.

These sentences should convert to:

ΕΛΑ ΑΝΝΑ! ΗΡΘΕ Η ΩΡΑ. ΥΣΤΕΡΑ ΘΑ ΕΙΝΑΙ ΑΡΓΑ.
ΟΤΑΝ ΠΑΜΕ ΣΠΙΤΙ ΘΑ ΚΑΛΕΣΟΥΜΕ ΤΗΝ ΗΡΑ.

Note that all the accents in initial capitals disappear. This is the correct behaviour; any deviation is wrong. So, for example, the Foursquare Greek maps and the Wordpress comment form below are wrong:

(This behaviour is not limited to these platforms; it generally happens when a CSS style capitalises text strings that have been keyed in as twin-case.)

 

Let’s modify our example to include a dieresis:

Έλα Άννα! Ήρθε η ώρα. 
Ύστερα θα φύγει το καΐκι.

Όταν πάμε σπίτι θα χορέψουμε ζεϊμπέκικο.

These sentences should convert to:

ΕΛΑ ΑΝΝΑ! ΗΡΘΕ Η ΩΡΑ. ΥΣΤΕΡΑ ΘΑ ΦΥΓΕΙ ΤΟ ΚΑΪΚΙ.
ΟΤΑΝ ΠΑΜΕ ΣΠΙΤΙ ΘΑ ΧΟΡΕΨΟΥΜΕ ΖΕΪΜΠΕΚΙΚΟ.

Note that the dieresis survives the conversion, as it should. But keep in mind that the highlighted letters are not like the rest of the capitals, since they must ‘gain’ an accent when converted back to lowercase:

ΕΛΑ ΑΝΝΑ! ΗΡΘΕ Η ΩΡΑ. ΥΣΤΕΡΑ ΘΑ ΦΥΓΕΙ ΤΟ ΚΑΪΚΙ.
ΟΤΑΝ ΠΑΜΕ ΣΠΙΤΙ ΘΑ ΧΟΡΕΨΟΥΜΕ ΖΕΪΜΠΕΚΙΚΟ.

It is possible to come up with simple rules to predict that behaviour, for example:

[space]+Ά/Έ/Ή/Ί/Ύ/Ό/Ώ+[class with all lowercase letters]
convert to:
[space]+Α/Ε/Η/Ι/Υ/Ο/Ω+[uppercase letters]

but the roundtrip is not so straightforward. For this to happen, the capitals that the accented initials convert to must be different than the ‘plain vanilla’ ones. In other words, the font must include multiple instances of unaccented capitals, addressed through OpenType features. (Using CIDs in this case means that the words will never degrade gracefully in an environment that does not read the feature: either they will lose the accent, or convert to a lowercase form amongst uppercase ones.).

 

Dipthongs and double vowels

(Note: we will refer here to pairs of vowels as diphthongs, although strictly speaking some are diphthongs and some are double vowels or «διγράμματα»: ‘double letters’. Also, the alpha-upsilon and epsilon-upsilon pairs are not diphthongs, but are relevant in this context so we include them for completeness.)

The general rule in case change for monotonic and most cases of polytonic is that the accent disappears from one vowel and is replace by a dieresis on the next one. The reason has to do with diphthongs that may be read in more than one way.

Examples below have the syllable with the dipthong not stressed:

pronunciation e.g.

αι /e/ παιδότοπος | ΠΑΙΔΟΤΟΠΟΣ

ει /i/ ελλειπτικός | ΕΛΛΕΙΠΤΙΚΟΣ

οι /i/ ποιμαντικός | ΠΟΙΜΑΝΤΙΚΟΣ

ου /oo/ πουθενά | ΠΟΥΘΕΝΑ

αυ /av/ or /af/ αυγό and ναυσικά | ΑΥΓΟ and ΝΑΥΣΙΚΑ

ευ /ev/ or /ef/ ζευγάρωμα and ευχή | ΖΕΥΓΑΡΩΜΑ and ΕΥΧΗ

 

but depending on the word root, read as: e.g.

αι /a/ then /i/ μαϊστράλι | ΜΑΪΣΤΡΑΛΙ

ει /e/ then /i/ ζεϊμπέκικο | ΖΕΪΜΠΕΚΙΚΟ

οι /o/ then /i/ βοϊδάμαξα | ΒΟΪΔΑΜΑΞΑ

ου /o/ then /i/ προϋπόθεση | ΠΡΟΫΠΟΘΕΣΗ

and αυ /a/ then /i/ εξαϋλώθηκε | ΕΞΑΫΛΩΘΗΚΕ

ευ /e/ then /i/ [γκεϋζέρ] | [ΓΚΕΫΖΕΡ] (extremely rare)

 

In the examples above, the dieresis makes it clear whether you should read the diphthong as a single sound or pronounce each vowel separately. In that case, when you switch cases you carry the dieresis over to the Iota/Upsilon to indicate the correct pronunciation.

Things get more complicated when one of the two vowels carries an accent. In the more common case of the dipthong pronounced as a single sound things are easy; the second vowel carries the accent, and no additional marking is necessary, in either case:

αι παίζουμε | ΠΑΙΖΟΥΜΕ

ει ελλείψεις | ΕΛΛΕΙΨΕΙΣ

οι ποίηση | ΠΟΙΗΣΗ

ου κούνημα | ΚΟΥΝΗΜΑ

αυ ναύλος and καύσιμα | ΝΑΥΛΟΣ and ΚΑΥΣΙΜΑ

ευ ρεύμα and τεύχος | ΡΕΥΜΑ and ΤΕΥΧΟΣ

(Note: αυ / ευ pronounced as vowel+consonant combination, but behaving as diphthongs.)

 

But what if the dipthong is to be pronounced as two successive vowels? Then there are two cases: either the first vowel carries the stress, or the second. If the first vowel carries the accent, the words look like this (with the older polytonic, which had a degree of superfluity in notation, in square brackets):

αι μάινα [μάϊνα] | ΜΑΪΝΑ [ΜΑΪΝΑ]

ει σέικερ [σέϊκερ] | ΣEΪΚΕΡ [ΣΕΪΚΕΡ]

οι μπόι [μπόϊ] | ΜΠΟΪ [ΜΠΟΪ] ου όυ | ΟΫ (rare; not applicable for αυ and ευ)

If the second vowel carries the accent, the words look like this:

αι μαΐστρος | ΜΑΪΣΤΡΟΣ

ει σεΐζης | ΣΕΪΖΗΣ (rare)

οι προΐσταμαι | ΠΡΟΪΣΤΑΜΑΙ

ου προΰπαρξη | ΠΡΟΫΠΑΡΞΗ (not applicable for αυ and ευ)