compile time or compile-time?

Binary functions

Before talking about binary functions, I’m going to discuss unary functions. The key point is that unary functions work really well in object oriented languages. Consider the class hierarchy:

abstract class Drawable has 
    member function Draw (x, y of type float) returning nothing
        with no implementation
abstract class Shape extends Drawable and has 
    member function Resize(newsize of type float) with no implementation
class Circle extends Shape and has 
    member variable mRadius of type float and 
    defines implementations for Draw and Resize
class Square extends Shape and has 
    member variable mSideLength of type float and 
    defines implementations for Draw and Resize
class Cross extends Drawable and 
    defines an implementation for Draw

We may now declare a function DrawInCenterOfWindow taking a Drawable reference. DrawInCenterOfWindow may call any of the functions defined in Drawable (like Draw). Using dynamic dispatch, if we pass DrawInCenterOfWindow an object with class Circle, when Drawable calls its parameter’s Draw function, the virtual table lookup yields Circle’s Draw implementations. Passing a Square, or some other class not known at the time of compiling DrawInCenterOfWindow works just as well.

Another function, ResizeToFillWindow, which takes a Shape reference as a parameter, calls the Resize member of its parameter. ResizeToFillWindow may be called with a Circle or a Square but not a Cross.

Lets say that DrawInCenterOfWindow declares a local variable Center of type Cross. In some languages, calling Center’s Draw function bypasses the dynamic dispatch mechanism and calls Cross’ Draw directly, since Center’s actual class can be established at compile-time. Other languages will use dynamic dispatch everywhere, which can be slower.

Member functions conceptually take a hidden first parameter, called ‘this’. In the pseudocode

function DrawInCenterOfWindow(drawable_param of type 
                                  Drawable reference) returns nothing
    drawable_param.Draw(0.0, 0.0)

‘this’ is set to the parameter drawable_param when calling the function ‘Draw.’ The nice thing is that the type of ‘this’ varies covariantly, by which I mean, Drawable’s Draw’s ‘this’ has static type Drawable,

Shape’s Draw’s ‘this’ has static type Shape, Circle’s Draw’s ‘this’ has static type Circle, etc. Notice that this is unlike the other parameters of member functions. Consider this other class hierarchy:

abstract class HasEquals has 
    member function IsEqualTo(rhs of type HasEquals reference) returning a boolean
class String extends HasEquals and 
    implements IsEqualTo
class Matrix extends HasEquals and 
    implements IsEqualTo

String’s IsEqualTo’s parameter must take a HasEquals parameter. Otherwise, String would not be a subtype of HasEquals. This is because a function taking a HasEquals reference as a parameter could call the parameter’s IsEqualTo function with any HasEquals reference. For example:

function CompareToAMatrix(rhs of type HasEquals reference) returns boolean
    local variable compare_with is type Matrix
    return rhs.IsEqualTo(compare_with)

function OtherFunction returns nothing
    local variable string_variable is type String
    PrintBoolean(CompareToAMatrix(string_variable))

The key here is that the only reason the ‘this’ is allowed become more specific in the subclass is because the dynamic dispatch mechanism insures that the member function implementation actually executed depends on the true, dynamic type of the object. Lets say class D extends class C, and that class C has a member function F. The rules needed for type soundness are:

If F in C takes in parameter IN of type A, then F in D must take parameter IN of type B where A extends B. This is called contravariance. Usually B will be equal to A, in fact often languages will require B=A, but that only the weaker condition of contravariance is required for type soundness.

If F in C takes returns type A (or outputs an object of type A in some other way), then F in D must return a type extending A. This is called covariance.

If F in C both inputs and outputs an object, the F in D must input and output an object of the same type. This is called matching.

Note that member variables often may be both read and written, in which case they follow the rule for matching, i.e. the member variables in D have the same type as the member variables in C.

This leads us to the problem. In the first case, where we had functions only taking one object reference parameter, specifically ‘this,’ (‘unary’ functions, though I’m abusing the term) everything worked nice and covariantly. String’s IsEqualTo function, on the other hand, has to be prepared to deal with a parameter which could any object whose class extends HasEquals (a ‘binary’ function since it depends on two object references: ‘this’ and the parameter). How a language solves this problem greatly affects its expressiveness -- see this interview with Alexander Stepanov (the creator of STL) at http://www.stlport.org/resources/StepanovUSA.html (search for ‘max’)

Really there are two problems: How can you write a function which can manipulate a child class efficiently while still being able to write a function which can manipulate any object of the base class. The second case may be further divided by whether the actual type of the object is known at compile time. Below I will refer to the three cases as (from potentially most efficient to least efficient):

    ‘manipulate child case’
    ‘static base manipulation case’
    ‘dynamic base manipulation case’

In each case, the technique could either be (from best to worst)

    Fast static dispatch
    Normal dynamic dispatch
    Casts
    Slow dispatch
    Can’t handle

Note that static dispatch may also allow inlining and other optimizations. Also note that casts may be made fast by making them unsafe.

How do various languages deal with this? There are many different answers. Some languages have multiple mechanisms (C++, for example) -- this makes the language larger (bad), but gives the programmer a choice of tradeoffs (good).

Multiple template instantiations (C++)

In this case, you don’t worry about making all classes which have an IsEqualTo function extend from a common base class. Instead, any function which wants to call IsEqualTo while still being generic uses a templated type. Big Example:

class String has member function IsEqualTo(rhs of type String) returning a boolean
class Matrix has member function IsEqualTo(rhs of type Matrix) returning a boolean
class Window

(no member function IsEqualTo)

class EqualToAnything has template<class T> member function IsEqualTo(rhs of type T) returning a boolean
template<class T> function UsesIsEqualTo(a,b of type T) returning a boolean
    return a.IsEqualTo(b)
template<class T, class U> function UsesIsEqualDifferently(a of type T, b of type U) returning a boolean
    return a.IsEqualTo(b)

function Test
    local variables a,b are type String
    local variables c,d are type Matrix
    local variable e is type EqualToAnything
    local variables f,g are type Window

(these work:)

UsesIsEqualTo(a,b)

(actually calls UsesIsEqualTo<String>(a,b) )

UsesIsEqualTo(c,d)

(actually calls UsesIsEqualTo<Matrix>(c,d) )

UsesIsEqualDifferently(a,b)

(actually calls UsesIsEqualDifferently<String,String>(a,b) )

UsesIsEqualDifferently(c,d)

(actually calls UsesIsEqualDifferently<Matrix,Matrix>(c,d) )

UsesIsEqualDifferently(e,a)

(actually calls UsesIsEqualDifferently<EqualToAnything,String>(e,a)

which calls EqualToAnything’s IsEqualTo<String>)

UsesIsEqualDifferently(e,c)

(actually calls UsesIsEqualDifferently<EqualToAnything,Matrix>(e,c)

which calls EqualToAnything’s IsEqualTo<Matrix>)

(these don’t:)

UsesIsEqualTo(a,c)

(Error: Can’t figure out T)

UsesIsEqualTo(e,f)

(Error: Window has no member ‘IsEqualTo’)

UsesIsEqualDifferently(a,c)

(actually calls UsesIsEqualDifferently<String,Matrix>(a,c)

but String’s IsEqualTo takes a String not a Matrix)

(Error: String’s IsEqualTo takes the wrong type)

UsesIsEqualDifferently(a,e)

(actually calls UsesIsEqualDifferently<String,EqualToAnything>(a,c)

but String’s IsEqualTo takes a String not a EqualToAnything)

(Error: String’s IsEqualTo takes the wrong type)

UsesIsEqualDifferently(c,a)

(actually calls UsesIsEqualDifferently<Matrix,String>(c,a)

but Matrix’s IsEqualTo takes a Matrix not a String)

(Error: Matrix’s IsEqualTo takes the wrong type)

Compiler Implementation:

The definition of templated functions is kept around (and therefore the compiler must be able to have read the definition of the function at the time of compilation of any call to that function, hence the bodies of templated functions are kept in header files, not just the declarations of their parameters and return types). Any time the compiler comes across a call to the templated function, it deduces the type to give to any type parameters (such as T in the example) using the unification algorithm (which is cool, look it up) and compiles the function with that assignment. This causes there to be many versions of the function compiled, which can improve speed at the expense of code size. Since it doesn’t (fully) compile the body of the function, it can defer checking the call to IsEqualTo until the type parameters are known.

Manipulate child case Fast static dispatch

Static base manipulation case Fast static dispatch

Dynamic base manipulation case Can’t handle

Pros: Can inline functions

Allocation is easier since class size known when type is instantiated

Don’t need a common class, just make sure any class passed has members which may be used in the right way

Can use as a mechanism for pushing work to compile-time or algorithmically generating code, like a macro processor, but more awkward

Cons: Does not handle ‘dynamic base manipulation case’

Can occasionally pass a class which only coincidentally has the right members

Code bloat

Errors only occur when function is instantiated. These errors may either be caused by mistake in the function or a type parameter with the wrong members. Either way the error messages are hard to understand.

Code depends on body of templated functions.

Run-time type casts (Java, C++, Sather, etc.)

Example:

abstract class HasEquals has member function IsEqualTo(rhs of type HasEquals reference) returning a boolean

class String extends HasEquals and implements IsEqualTo

class Matrix extends HasEquals and implements IsEqualTo

member of String function IsEqualTo(rhs of type HasEqual reference) returns boolean

local variable post_cast is a String reference

post_cast = cast to String reference(rhs)

(if rhs is a Matrix, either generates a run-time error or is silently buggy)

(now use ‘this’ and ‘post_cast’, both of which have type String)

Another syntax which supports the same functionality as these casts (in a way that some would argue is nicer) is Sather’s typecase construction.

Manipulate child case Casts

Static base manipulation case Casts

Dynamic base manipulation case Casts

Pros: Good at handling ‘dynamic base manipulation case’

Cons: Either unsafe (like most C/C++ casts) or has speed hit (like Java’s casts or C++’s dynamic_cast)

Not as efficient as other solutions at handling ‘manipulate child case’ and ‘static base manipulation case.’

Does not detect errors at compile time.

May require additional information stored with virtual table (a very minor issue)

Let parameters vary covariantly (Eiffel)

This is either type unsound (fast but unsafe) or requires a run-time type check (slower, safe, but can only detect errors at runtime). Similar to the previous case but with a false sense of security -- unsafe or slow casts are implicit.

Bridge functions (Generic Java)

A ‘bridge’ function takes the same parameters as the parent class, performs a safe cast (generating a runtime error if the type is wrong), and calls the version with the more specific parameters. This is a bit better than the previous case, since the run-time speed hit can be avoided when dealing with the child class directly.

Example:

abstract class HasEquals has member function IsEqualTo(rhs of type HasEquals reference) returning a boolean

class String extends HasEquals and bridge to function IsEqualTo(rhs of type String reference) returning a boolean

member of String function IsEqualTo(rhs of type String reference) returns boolean

(compare ‘this’ and ‘rhs’ both of which have type String)

(compiler automatically generates:)

member of String function IsEqualTo(rhs is type HasEqual reference) returns boolean

local variable post_cast is a String reference

post_cast = cast to String reference(rhs)

(if rhs is a Matrix, generates a run-time error)

return IsEqualTo(post_cast)

Note that this functionality can be implemented by the user if function overloading is available in the language.

Manipulate child case Fast static dispatch

Static base manipulation case Cast

Dynamic base manipulation case Cast

Pros: Good at handling ‘dynamic base manipulation case’ and ‘manipulate child case’

Cons: Not as efficient as other solutions at handling ‘static base manipulation case.’

Does not detect errors at compile time.

Requires additional information stored with virtual table (a very minor issue)

Only dynamic typing no static typing (Lua, Python)

In this case, the run-time member function lookup (into the virtual table) retrieves a function whose type is (again at run-time) checked against the parameters passed.

Manipulate child case Slow dispatch

Static base manipulation case Slow dispatch

Dynamic base manipulation case Slow dispatch

Pros: Types may vary however you like

Don’t need a common class, just make sure any class passed has members which may be used in the right way

Handles all three cases.

Cons: Subclassing is not necessarily subtyping

Errors are only detected at runtime

Slower

Multiple dispatch (CLOS, Dylan, Cecil)

(I’m less familiar with how this is implemented or the syntax used)

Manipulate child case Fast static dispatch (hopefully)

Static base manipulation case Slow dispatch (?)

Dynamic base manipulation case Slow dispatch

Pros: Flexible

Handles all three cases

If the compiler can deduce the run-time type at compile-time, it has enough information to optimize the ‘manipulate child case’

Cons: Big dispatch tables

Potentially slower -- may rely on a slow dynamic dispatch when it could be more efficient

Probably slow for the ‘static base manipulation case’

Potential

Parameterized types (Sather)

Also called ‘generics.’ Here types may be parameterized by a type (called a type parameter) which is not fixed at the time of declaration of the class. Very little may be assumed about the type parameters when compiling the class, but sometimes that is fine.

Example:

abstract class HasEquals<T> has member function IsEqualTo(rhs of type T reference) returning a boolean

class String extends HasEquals<String> and implements IsEqualTo

Manipulate child case Fast static dispatch

Static base manipulation case Normal dynamic dispatch

Dynamic base manipulation case Normal dynamic dispatch

Note that the speed here depends a lot on the aggressiveness of the compiler and details such as: whether a ‘final’ option is available, how much type inference may be done at compile time, etc.

Pros: Works OK for writing function declarations

Works OK for writing container which don’t need to know anything about what they contain

Cons: Not as specific as below

Don't know anything about T’s type when compiling HasEquals, so:

Not good for writing generic implementations of functions defined in base class

Bounded, recursive parameterized types (Generic Java, Joos)

This is just like the previous case, but an additional constraint may (optionally) be placed on the type parameters. This makes explicit any restrictions which the type parameter must adhere to, which allows the parameterized type to perform more operations with generic arguments. Bounded types help with the binary function problem when the bounds are allowed to be recursive. That is, the type bound can be the class being declared (see the example for how this works).

Example:

abstract class HasEquals<T extends HasEquals<T>> has member function IsEqualTo(rhs of type T reference) returning a boolean

abstract class HasCompare<T extends HasCompare<T>> extends HasEquals<T> has member function IsLessThan(rhs of type T reference) returning a boolean

class String extends HasCompare<String> and implements IsEqualTo, IsLessThan

Note: the unification algorithm allows the compiler to deal with this recursiveness.

Manipulate child case Fast static dispatch

Static base manipulation case Normal dynamic dispatch

Dynamic base manipulation case Normal dynamic dispatch

Note that the speed here depends a lot on the aggressiveness of the compiler and details such as: whether a ‘final’ option is available, how much type inference may be done at compile time, etc.

Pros: Good compile-time checking

Know something about T when compiling HasEquals

Works for both function declarations and bodies

Covariance!

Cons: Unfamiliar idiom

A bit verbose, prone to error

Conclusions and other comments

Note that some form of templates or parameterized types helps remove casts from code using generic containers, in addition to addressing the binary function problem. Containers typically don’t need much information about the type of data they contain, usually just issues concerning storage.

Except "Multiple template instantiations (C++)," and "Only dynamic typing no static typing (Lua)" the techniques above need to be combined with some form of multiple inheritance in practical applications. The issue is that classes need to be descended from one abstract class for every binary operation.

Basically, my preference is for C++’s solution (combining ‘Multiple template instantiations’ with ‘run-time casts’) when you are concerned about speed above all else, including language complexity. For other languages, ‘Bounded, recursive parameterized types’ (combined with multiple inheritance) is a much more object oriented solution which does not sacrificing expressiveness. I’d like to research other macro facilities which give the same speed as C++ but with different syntaxes. In particular, expression templates and traits are great techniques, but which requires quite a lot of work to use.

Also see http://www.cis.ksu.edu/Department/Seminars/garle.html

and http://www.cs.washington.edu/homes/todd/papers/oopsla98.html