Gekkio's technical blog

Modern technologies with passion

Navigation Menu

Increased compile-time safety with phantom types

Posted on Feb 7, 2013 | 0 comments

Introduction

Using phantom types is a very simple technique that can be used to increase the compile-time safety of code. There are a lot of potential use cases with different levels of complexity, but even a very lightweight usage of phantom types can significantly increase the compile-time safety.

A phantom type is simply a parameterized type with an unused type parameter. For example:

public class MyPhantomType<T> {
  public String sayHello() {
     return "hello";
  }
  // other methods/fields that never refer to T
}

This example class has a type parameter T, but it is never actually used in the code. At first glance this doesn’t seem to be very useful, but that’s not true! All object instances of phantom types carry the type information, so this technique can be used to “tag” values with some extra information that can be compile-time checked. We can of course escape the typing at any time by writing code without the generics, but that should be avoided at all costs. Some languages, such as Scala completely disallow dropping type parameters, so with Scala you would always have to keep the type information completely.

Example use case and implementation

One of the simplest and most useful use cases for phantom types is database ids. If we have a typical three-layer (data, service, web) Java web application, we can gain a lot of compile-time safety by replacing the use of raw ids with phantom types everywhere except at the “endpoints” of the architecture. So, the data layer will put raw ids to database queries, and the web layer might get raw ids from external sources (e.g. HTTP parameters), but otherwise we are always dealing with phantom types. In this example I assume that the database id type is always 64-bit long number. First we’ll need marker interface that will be implemented by all “entity classes”, which should be supported by the phantom type id mechanism:

public interface Entity {
  Long getId();
}

The only purpose of this marker interface is to restrict our phantom typed id to a certain set of tagged classes, and provide the getId method that will be used in the implementation. The actual phantom type is an immutable container for a single id value. The type parameter represents the “target type” of the id, which makes it possible to differentiate between id values of different entities in a compile-time safe way. I like to call this class Ref (shorthand for Reference), but this is just a personal choice.

@Value
@RequiredArgsConstructor(AccessLevel.PRIVATE)
public final class Ref<T extends Entity> implements Serializable {
  public final long id;  

  public static <T extends Entity> Ref<T> of(T value) {
    return new Ref<T>(value.getId());
  }
  public static <T extends Entity> Ref<T> of(long id, Class<T> clazz) {
    return new Ref<T>(id);
  }

  @Override
  public String toString() {
    return String.valueOf(id);
  }

}

This example class uses the @Value and @RequiredArgsConstructor annotations from Project Lombok. If you don’t use Lombok, add the constructor, getter, equals, and hashCode implementations manually (or look for the complete implementation below). Note how the type parameter T is never used anywhere. This also means that you cannot at runtime know the type of the Ref, but that is not usually necessary.

Using the example implementation

Now, we’ll replace the use of raw ids with Refs whenever possible. For example, we could have a service-level method that add a user to a group:

void addUserToGroup(long userId, long groupId);
// without parameter names
void addUserToGroup(long, long);

// VS

void addUserToGroup(Ref<User> userRef, Ref<Group> groupRef);
// without parameter names
void addUserToGroup(Ref<User>, Ref<Group>);

Now, when we want to call this method, we’ll always need Ref objects instead of raw long values. In this example there are two ways to get Ref values.

  1. If you have an instance of the actual object, call Ref.of(object). This is the most common method in layers other than web
  2. If you have a raw id, and you know the target type, call Ref.of(id, TargetType.class). This is usually required in the web layer if the raw id comes from an external source

In order to extract the raw id value from the Ref, you can read the field or use the getter. This is typically only needed right before database query construction.

Closing thoughts

In order to understand the benefits of Refs, try to think about the following cases:

  • What happens if you change the order of parameters in a method call which takes ids of different types? (for example our addUserToGroup)
  • What happens if you change the type of the database id (e.g. Integer -> Long, or Long -> UUID)?
  • How likely will you get runtime errors, if you often have method parameters of the same type as the id, but they are not ids? For example, if you have Integer ids and you mix ids and some sort of list indexes in the same method

In all of these cases, the use of Refs guarantees that you get a compile-time error in places where the code is not correct. In a typical codebase this is a huge win with very little effort. Compile-time safety decreases the cost and difficulty of refactoring, which makes maintaining the codebase much, much easier and safer.

Database ids are just a simple example of phantom types. Other typical use cases include some sort of state machines (e.g. Order<InProcess>, Order<Completed> vs just Order objects), and some kind of unit information for values (e.g. LongNumber<Weight>, LongNumber<Temperature> vs just longs).

Ref<T> implementation (without Lombok)

public final class Ref<T extends Entity> implements Serializable {
  public final long id;

  public static <T extends Entity> Ref<T> of(T value) {
    return new Ref<T>(value.getId());
  }
  public static <T extends Entity> Ref<T> of(long id, Class<T> clazz) {
    return new Ref<T>(id);
  }

  @Override
  public String toString() {
    return String.valueOf(id);
  }

  private Ref(long id) {
    this.id = id;
  }

  public long getId() {
    return this.id;
  }

  @Override
  public int hashCode() {
    return (int) (id ^ (id >>> 32));
  }

  @Override
  public boolean equals(Object o) {
    if (this == o)
      return true;
    if (o == null || o.getClass() != this.getClass())
      return false;
    Ref<?> other = (Ref<?>) o;
    return other.id == this.id;
  }
}

 

Read More

ZK Gritter: Growl-like notifications for ZK apps

Posted on Oct 22, 2012 | 0 comments

Introduction

Jawwa ZK Gritter is an open source library that can be used to add Growl-like notifications to ZK apps. The library provides a simple-to-use server-side Java API that can be used to control notifications in the application. Installation instructions are available in the reference manual.

A sample app that demonstrates all the customization options is available at Github:

https://github.com/Gekkio/blog/tree/master/2012/10/zk-gritter

Usage

The library provides a class called Gritter, which contains multiple static methods. Notifications are added using the builder pattern, so you must first obtain a notification builder, build the notification by calling the appropriate methods, and finally complete the builder and show the notification.

Gritter.notification().withTitle("ZK Gritter demo").withText(LOREM_IPSUM).
  show()

All notifications require both the title and text parameters, and you must remember to call show or the notification will not be actually shown to the user.

Customization

The library supports multiple parameters that can be used to customize the appearance and behaviour of the notifications. The API is fully documented with Javadoc.

Timeout and stickiness

By default all notifications fade out in 6 seconds, but this can be overridden by setting a custom timeout in milliseconds.

Gritter.notification().withTitle("ZK Gritter demo").withText(LOREM_IPSUM).
  withTime(500).show();

You can also add sticky notifications that don’t fade out at all unless the user closes them or they are removed programmatically.

Gritter.notification().withTitle("ZK Gritter demo").withText(LOREM_IPSUM).
  withSticky(true).show();

All visible notifications can be cleared manually. Calling removeAll removes both normal and sticky notifications.

Gritter.removeAll();

Custom CSS

The sclass parameter can be used to set a custom CSS class.

<style>
  .gritter-red .gritter-top {
    background-image: url(gritter-red.png);
  }
  .gritter-red .gritter-item {
    background-image: url(gritter-red.png);
  }
  .gritter-red .gritter-bottom {
    background-image: url(gritter-red.png);
  }
</style>
Gritter.notification().withTitle("ZK Gritter demo").withText(LOREM_IPSUM).
  withSclass("gritter-red").show();

Custom image

Notifications can also include an optional 48x48px image.

Java code:

Gritter.notification().withTitle("ZK Gritter demo").withText(LOREM_IPSUM).
  withImage("/warning.png").show();

Read More

Advanced ZK: Asynchronous UI updates and background processing – part 2

Posted on Oct 20, 2012 | 2 comments

Introduction

In part 1 I showed how server push and threads can be used to execute background tasks in a ZK application. However, the simple example had a major flaw that makes it a bad approach for real-world applications: it starts a new thread for each background task.

JDK5 introduced the ExecutorService class, which abstracts away the threading details, and gives us a nice interface which can be used to submit tasks for background processing.

In this blog post I will describe the most important parts of creating a ZK app, which contains a background task that takes a string, and returns it in uppercase. The complete sample project is available at Github:

https://github.com/Gekkio/blog/tree/master/2012/10/async-zk-part-2

1. Create an ExecutorService instance

First we need an ExecutorService that we can use in our ZK code. In most cases we want a shared singleton instance, which could be configured and managed by dependency injection (e.g. Spring). It is very important to make sure that the ExecutorService is created only once, and it’s shut down properly with the application.

In this sample project I will use a simple holder class, which manages the lifecycle of a single statically available ExecutorService instance. This holder must be configured as a listener in zk.xml.

package sample;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

import org.zkoss.zk.ui.WebApp;
import org.zkoss.zk.ui.util.WebAppCleanup;
import org.zkoss.zk.ui.util.WebAppInit;

public class SampleExecutorHolder implements WebAppInit, WebAppCleanup {

    private static volatile ExecutorService executor;

    public static ExecutorService getExecutor() {
        return executor;
    }

    @Override
    public void cleanup(WebApp wapp) throws Exception {
        if (executor != null) {
            executor.shutdown();
            System.out.println("ExecutorService shut down");
        }
    }

    @Override
    public void init(WebApp wapp) throws Exception {
        executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
        System.out.println("Initialized an ExecutorService");
    }

}

Note that the thread pool is configured using a fixed size based on processors in the system. Proper thread pool sizing is very important, and depends on the type of tasks you intend to execute. The maximum number of threads is also the maximum amount of simultaneous concurrent tasks!

2. Write event classes that model the results of the background task

We’ll use ZK server push to communicate the task results back to the UI, so the results must be modeled as ZK events. It’s always a good idea to create custom subclasses of Event instead of adding the results in the data parameter, because a custom class is more typesafe and can supports multiple fields.

The first event class represents a status update that is sent while the task is still running. In this example it will contain the amount of characters in the input string.

package sample;

import org.zkoss.zk.ui.event.Event;

public class FirstStepEvent extends Event {

    public final int amountOfCharacters;

    public FirstStepEvent(int amountOfCharacters) {
        super("onFirstStepCompleted", null);
        this.amountOfCharacters = amountOfCharacters;
    }

}

The second event class represents the fully completed task. In this example it contains the input string in upper case.

package sample;

import org.zkoss.zk.ui.event.Event;

public class SecondStepEvent extends Event {

    public final String upperCaseResult;

    public SecondStepEvent(String upperCaseResult) {
        super("onSecondStepCompleted", null);
        this.upperCaseResult = upperCaseResult;
    }

}

3. Write the task class

The task class should have the following characteristics:

  • It implements Runnable
  • It takes all required input data as constructor arguments (the data should be immutable if possible!). This input data must be thread-safe, and generally should not include any ZK-related stuff (no components, sessions, etc.). For example, if you want to use a Textbox value as input, read the value in advance and don’t pass the Textbox itself as an argument.
  • It takes a Desktop, and at least one EventListener as constructor arguments. They are needed for sending the results back to the UI

In this example the only input data is a string that will be used to compute the task results.

package sample;

import java.util.Locale;

import org.zkoss.zk.ui.Desktop;
import org.zkoss.zk.ui.DesktopUnavailableException;
import org.zkoss.zk.ui.Executions;
import org.zkoss.zk.ui.event.Event;
import org.zkoss.zk.ui.event.EventListener;

public class SampleTask implements Runnable {

    private final String input;
    private final Desktop desktop;
    private final EventListener<Event> eventListener;

    @SuppressWarnings({ "rawtypes", "unchecked" })
    public SampleTask(String input, Desktop desktop, EventListener eventListener) {
        this.input = input;
        this.desktop = desktop;
        this.eventListener = eventListener;
    }

    @Override
    public void run() {
        try {
            // Step 1
            Thread.sleep(10000);
            Executions.schedule(desktop, eventListener, new FirstStepEvent(input.length()));

            // Step 2
            Thread.sleep(10000);
            Executions.schedule(desktop, eventListener, new SecondStepEvent(input.toUpperCase(Locale.ENGLISH)));
        } catch (DesktopUnavailableException e) {
            System.err.println("Desktop is no longer available: " + desktop);
        } catch (InterruptedException e) {
        }
    }

}

Note how all the constructor arguments are stored in private final fields, and how the input data is immutable (Strings are immutable in Java!). The task simulates long-running processing by using Thread.sleep, and submits a status event when the “processing” is half done.

4. Schedule tasks in ZK composers

Using the task in composers is very simple. You only need to enable server push, and submit a new task instance to the executor. This automatically starts the task once a free background thread is available.

desktop.enableServerPush(true);
// Get the executor from somewhere
executor = SampleExecutorHolder.getExecutor();
executor.execute(new SampleTask(input.getValue(), desktop, this));

In this sample the composer extends GenericForwardComposer, which implements EventListener, so it can itself handle the resulting task events. Both events are handled by methods that update the UI with status information.

public void onFirstStepCompleted(FirstStepEvent event) {
    status.setValue("Task running: " + event.amountOfCharacters + " characters in input");
}

public void onSecondStepCompleted(SecondStepEvent event) {
    status.setValue("Task finished: " + event.upperCaseResult);
}

Final words

It’s quite easy to add robust support for long-running tasks in a ZK application by using this technique. The resulting code in ZK composers is very simple, because the results are passed using the typical Event/EventListener paradigm that is very common within ZK apps.

The biggest dangers in this technique are thread-safety bugs, which can be very difficult to debug. It is absolutely crucial to fully understand the threads where every piece of code is executed, and ensure that all shared state is fully thread-safe. Using immutable input data, and immutable output events is usually enough to ensure safety as long as the background task itself doesn’t access other non-thread-safe resources. Some common mistakes are:

  • Invoking thread-local dependent library methods in the background task (e.g. any method that seems to magically get the “current” value of some type). The background threads will not automatically contain the same thread-local values as servlet threads, so by default all these kind of methods will fail. For example Sessions.getCurrent(), Executions.getCurrent() in ZK, many Spring Security static methods.
  • Passing non-thread-safe parameters to the background task. For example, passing a mutable List that might be modified by the composer while the task is running (always make copies of mutable collections!).
  • Passing non-thread-safe result data in events. For example, passing a List in a result event, while the List will be modified later on in the task (always make copies of mutable collections!).
  • Accessing non-thread-safe methods in the desktop. Even though you have access to the desktop in the background task, most desktop methods are not thread-safe. For example, calling desktop.isAlive() is not guaranteed to return the status correctly (at least in ZK 6.5 the method relies on non-volatile fields, so writes are not guaranteed to be visible in the background thread)
Read More

Java pitfalls: Field access in inner classes

Posted on Feb 28, 2012 | 1 comment

This is not a “pitfall” per se, but an implementation detail worth knowing. Let’s say I have a inner class with a field. Such a field is visible to the enclosing class, but which one of the following ways is the fastest way to access it? Note! I’m only looking here at the generated bytecode, and not considering any JIT optimizations, so this “performance analysis” is very naïve

public class Test {

  public void testAlternatives() {
    // Alternative 1
    System.out.println(new Inner().field);
    // Alternative 2
    System.out.println(new Inner().getField());
    // Alternative 3
    System.out.println(new Inner2().field);
    // Alternative 4
    System.out.println(new Inner2().getField());
  }

  class Inner {
    private int field;

    public int getField() {
      return field;
    }
  }

  class Inner2 {
    int field;

    public int getField() {
      return field;
    }
  }

}

An intuitive answer is that alternatives 1 and 3 are equally fast because the field is always visible to the enclosing class, and both use field access which is overall slightly faster than method access used in alternatives 2 and 4. However, there’s an implementation detail that causes this to be untrue. The JVM itself does not have a concept called “inner classes”. The whole concept is implemented by the Java compiler and in the bytecode level everything consists of normal classes.

The issue here is that if the inner class has a private field, and the compiler will eventually compile the inner class as a normal class. A private field in a normal class cannot be accessed by other classes, so the enclosing Test class cannot “see” the field without some tricks. Here’s the above code “desugared” to what the compiler actually compiles to bytecode:

public class Test {
  public void testAlternatives() {
    // Alternative 1
    System.out.println(Test$Inner.access$000(new Test$Inner(this)));
    // Alternative 2
    System.out.println(new Test$Inner(this).getField());
    // Alternative 3
    System.out.println(new Test$Inner2(this).field);
    // Alternative 4
    System.out.println(new Test$Inner2(this).getField());
  }
}

class Test$Inner {
  final Test this$0;

  private int field;

  Test$Inner(Test test) {
    this$0 = test;
  }

  public int getField() {
    return field;
  }

  static int access$000(Test$Inner inner) {
    return inner.field;
  }

}

class Test$Inner2 {
  final Test this$0;

  int field;

  Test$Inner2(Test test) {
    this$0 = test;
  }

  public int getField() {
    return field;
  }

}

As you can see, a package-level static accessor method called access$000 is generated in order to grant access to the private field. Now it’s easier to see that alternative 3 will most likely be the fastest one, because it is the only one that uses direct field access. Using package access in fields is a micro-optimization, but this whole thing is definitely a detail that should be known by Java developers. In performance-critical parts of code it might actually matter, and the Android performance guide actually mentions this implementation detail.

This implementation detail may also cause slight confusion when field access is attempted on a null reference of the inner class. Consider the following code:

public class NullTest {
  class Inner {
    private int field;
  }

  public void test() {
    Inner inner = null;
    System.out.println(inner.field);
  }

  public static void main(String[] args) {
    new NullTest().test();
  }
}

The variable “inner” is null, so a NullPointerException is obviously thrown. However, what is not apparent from the original code is that the exception is thrown inside the compiler-generated static accessor method!

$ java NullTest
Exception in thread "main" java.lang.NullPointerException
	at NullTest$Inner.access$000(NullTest.java:2)
	at NullTest.test(NullTest.java:8)
	at NullTest.main(NullTest.java:12)

The stack trace contains the intuitive exception source (line 8), but the real source will confuse developers who don’t know about compiler-generated accessor methods.

Read More

Advanced ZK: Asynchronous UI updates and background processing – part 1

Posted on Dec 30, 2011 | 3 comments

Use cases

Asynchronous UI updates are very useful, because they typically improve the responsiveness, usability and the general feel of user interfaces. I’ll be focusing here on the ZK framework, but generally the same principles apply for desktop UIs too (Swing, SWT).

Long-running processing

Sometimes you might have a database query, or an external web service call that takes a long time. Typically these jobs are synchronous, so basically there is a specific point in the code where the system will have to wait for a result and will block the thread that runs the code. If you end up running code like that in an UI thread, it will usually block the UI completely.

Real-time updates

Sometimes you don’t know in advance the exact time when something in the UI should be updated. For example, you could have a visual meter that shows the amount of users in an application. When a new user enters the application, the UIs of the current users should be updated as soon as possible to reflect the new user count. You could use a timer-based mechanism to continuously check if the amount of users has changed, but if there’s too many simultaneous users, the continuous checking will cause a very heavy load even if there is nothing to actually update in the UIs.

Basic concepts

Let’s first digest the title of this blog post: “Asyncronous UI updates and background processing”

Background processing

In the long-running processing use case the most obvious way to reduce UI blocking is to move expensive processing from the UI threads to some background threads. It’s very important to be able to understand what kind of thread will run the code in different parts of your application. For example, in ZK applications, most code is executed by servlet threads which are basically servlet world equivalents to UI threads. In order to execute code in some background thread, we’ll need a thread pool. The easiest way is to use java.util.concurrent.ExecutorService that was introduced in JDK5. We can push Runnable objects to the ExecutorService, so we are basically asking the ExecutorService to run a specific block of code in some background thread.

It is absolutely crucial to understand that frameworks that use ThreadLocals will have problems with this approach because the ThreadLocals that are set in the servlet thread will not be visible in the background thread. An example is Spring Security which by default uses a ThreadLocal to store the security context (= user identity + other things).

Asynchronous UI updates

What does an asynchronous UI update mean in this context? Basically the idea is that once we have some information that we’d like to present in the UI, we’ll inform the UI of the new data (= asynchronous) instead of directly updating the UI in the background thread (= synchronous). We cannot know in advance when the new information is available, so we cannot ask for the information from the client side (unless we use polling which is expensive).

Server push in ZK

With ZK we have basically two different approaches we can use to update the UI once a background thread has new information. The name “server push” comes from the fact that the server has some new data that has to be pushed to the client instead of the typical workflow (client asks the server for information). Firstly, you can do synchronous updates by grabbing exclusive access to a desktop by using Executions.activate/deactivate. Personally I discourage this because once you have exclusive access, UI threads will have to wait until you deactivate the desktop. That’s why I won’t cover this method at all in this blog post.

On the other hand, asynchronous updates are done by using Executions.schedule, which conforms to the Event/EventListener model of normal event processing. The idea is that we can push normal ZK Event objects to EventListeners, and the client side will be informed of these events. After that ZK does a normal AJAX request using Javascript and the Events will be processed by the EventListeners. This means that if we use asynchronous updates, all actual event processing will be done by servlet threads and all ThreadLocals are available as usual. This makes the programming model very simple, because you need to only have normal event listener methods without complex concurrent programming.

Here’s a small example:

public class TestComposer extends GenericForwardComposer {
  private Textbox search;

  public void onClick$startButton() {
    if (desktop.isServerPushEnabled()) {
      desktop.enableServerPush(true);
    }

    final String searchString = search.getValue();
    final EventListener el = this; // All GenericForwardComposers are also EventListeners

    // Don't do this in a real-world application. Use thread pools instead.
    Thread backgroundThread = new Thread() {
       public void run() {
         // In this part of code the ThreadLocals ARE NOT available
         // You must NOT touch any ZK related things (e.g. components, desktops)
         // If you need some information from ZK, you need to get them before this code
         // For example here I've read searchString from a textbox, so I can use the searchString variable without problems
         String result = ... // Retrieve the result from somewhere
         Executions.schedule(desktop, el, new Event("onNewData", null, result));
       }
    };

    backgroundThread.start();
  }
  public void onNewData(Event event) {
    // In this part of code the ThreadLocals ARE available
    String result = (String) event.getData();
    // Do something with result. You can touch any ZK stuff freely, just like when a normal event is posted.
  }
}

In part 2 I show you how to use JDK5 ExecutorServices to run tasks without manually creating threads. If you truly want to understand ZK server push, you should also read the relevant ZK documentation.

Read More

In defense of Scala: Understanding the ++ operator

Posted on Nov 29, 2011 | 18 comments

Many people, including Stephen Colebourne, have complained about some of the methods in the Scala collections API. A common example is the ++ operator:

// There are many variants but this is what Stephen Colebourne used as an example
def ++ [B >: A, That] (that: TraversableOnce[B])(implicit bf: CanBuildFrom[List[A], B, That]) : That

Here’s an example of its use:

  val first = List(1, 2, 3)
  val second = List(4, 5, 6)
  println(first ++ second)
  // List(1, 2, 3, 4, 5, 6)

In this blog post I’ll try to show that the perceived complexity of the definition is a necessary thing arising from several design goals. I’ll try to show code examples in Java so that the overall syntax is familiar for most people. Basically I’m trying to say that by following the following three goals you end up with the same kind of definition that Scala has (except that it cannot be represented with Java without violating any of the goals). If you feel that your grasp of Java generics is lacking or just want the TLDR version, you can skip right away to the monstrosity.

Design goals

1. Immutability

  • The collection classes should be immutable.

2. Type safety

  • The API should be as type safe as possible (explicit instanceofs or casts should not be needed)

3. Conformity with OOP

  • Common methods should also be abstracted to interfaces if possible
  • The collection classes should work with element subtypes/supertypes correctly

Common code

First we have a naive ImmutableList implementation that just wraps a standard Java list:

// ImmutableList.java
public class ImmutableList<A> {
  private final List<A> inner;

  private ImmutableList(List<A> inner) {
    this.inner = inner;
  }

  public static ImmutableList<A> of(A value1, A value2) {
    List<A> inner = new ArrayList<A>();
    inner.add(value1);
    inner.add(value2);
    return new ImmutableList<A>(inner);
  }
}

We’ll also use some data types to demonstrate design goal 3:

public interface Animal {
  String getSound();
}

public class Cat implements Animal {
  public String getSound() {
    return "meow";
  }
}

public class Dog implements Animal {
  public String getSound() {
    return "woof";
  }
}

Attempt 1

// ImmutableList.java
public ImmutableList<A> concat(ImmutableList<A> other) {
  List<A> newInner = new ArrayList<A>();
  newInner.addAll(this.inner);
  newInner.addAll(other.inner);
  return new ImmutableList<A>(newInner);
}

Well that was easy, right?

Nope, the implementation doesn’t support variances (return element type can be super type of T, argument element type can be sub type of T).

ImmutableList<Cat> cats = ImmutableList.of(new Cat(), new Cat());
ImmutableList<Dog> dogs = ImmutableList.of(new Dog(), new Dog());
ImmutableList<Animal> animals = cats.concat(dogs); // Will NOT compile

Attempt 2

As far as I know, it’s not possible to make the above code compile with Java unless you drop generics (= violate goal 2). It’s possible to play with wildcards if you make it a static method instead:

// ImmutableList.java
public static <S> ImmutableList<S> concat(ImmutableList<? extends S> first, ImmutableList<? extends S> second) {
  List<S> newInner = new ArrayList<S>();
  newInner.addAll(first.inner);   
  newInner.addAll(second.inner);  
  return new ImmutableList<S>(newInner);
}

But a static method is a completely different beast, so let’s just ignore this problem for a moment and forget the support for variances. We are violating goal 3, but let’s just go on.

Attempt 3

Now let’s imagine we extend our immutable collection library with another type: ImmutableSet. We want to be able to concat sets too, so we create a concat method (still violating goal 3 regarding variances):

// ImmutableSet.java
public ImmutableSet<A> concat(ImmutableSet<A> other);

Now, based on goal 3 we must abstract this concat method to an interface called ImmutableCollection:

// ImmutableCollection.java
public interface ImmutableCollection<A> {
  ImmutableCollection<A> concat(ImmutableCollection<A> other);
}

Looks good, right?

Nope, the concat method now returns a generic collection, and the actual type is lost. It’s very reasonable to expect this to work:

ImmutableList<Dog> dogs = ImmutableList.of(new Dog(), new Dog());
ImmutableList<Dog> moreDogs = ImmutableList.of(new Dog(), new Dog());
ImmutableList<Dog> fourDogs = dogs.concat(moreDogs); // Won't compile because concat returns ImmutableCollection

The point here is that the returned object is an ImmutableList, but at compile time we cannot abstract the method in ImmutableCollection AND preserve the return type. We can of course cast the return value but we’ll violate goal 3.

Attempt 4

We need a way to describe the implementation type in the interface:

// ImmutableCollection.java
<THAT extends ImmutableCollection<A>> THAT concat(ImmutableCollection<A> other);
// ImmutableList.java
public <THAT extends ImmutableCollection<A>> THAT concat(ImmutableCollection<A> other) {
  // ...
  return new ImmutableList<A>(newInner); // Won't compile!!
}

Now the interface compiles and the dog example works, but the implementation of concat will not compile anymore. The problem is that the implementation of concat must support all ImmutableCollections while the implementation is providing only a subtype (= ImmutableList).

Attempt 5

We must somehow abstract away the construction of the new collection:

// ImmutableBuilder.java
public interface ImmutableBuilder<THAT> {
  THAT build(Collection elements);
}

I’ve violated goal 2 and dropped generics from elements in advance because I know they won’t work. An ImmutableBuilder is just a factory that knows how to create objects of type THAT from a collection of elements. Now we’ll change the signatures of both the interface and the implementation:

// ImmutableCollection.java
<THAT> THAT concat(ImmutableCollection<A> other, ImmutableBuilder<THAT> builder);

// ImmutableList.java
public <THAT> THAT concat(ImmutableCollection<A> other, ImmutableBuilder<THAT> builder) {
  List<A> inner = // ...
  return builder.build(inner);
}

Everything compiles now, so now we only need an implementation of a builder:

// ImmutableList.java
public static <T> ImmutableBuilder<ImmutableList<T>> builder() {
  return new ImmutableBuilder<ImmutableList<T>>() {
    public ImmutableList<T> build(Collection elements) {
      return new ImmutableList<T>(new ArrayList<T>(elements));
    }
  };
}

Overview of the final Java version

We now have everything in place for our immutable collection API. On our journey we have violated goals 2 and 3 (due to language limitations) and added complexity to our implementation. However, the complexity cannot be avoided without violating the design goals. Our API is also pretty clunky to use:

// ImmutableCollection.java
<THAT> THAT concat(ImmutableCollection<A> other, ImmutableBuilder<THAT> builder);
ImmutableBuilder<ImmutableList<Dog>> listBuilder = ImmutableList.builder();

ImmutableList<Dog> dogs = ImmutableList.of(new Dog(), new Dog());
ImmutableList<Dog> moreDogs = ImmutableList.of(new Dog(), new Dog());
ImmutableList<Dog> fourDogs = dogs.concat(moreDogs, listBuilder);

We don’t support variances and we have to pass around a builder for most of the operations. Due to goal 1, most interesting operations on the collections will return new ones, and thus will need a builder instance!

Stephen Colebourne wrote about ++ in his blog post:

“In fact this is the equivalent to Java’s addAll() on a list”

Reading this made me completely lose my respect to the guy. This is absolutely wrong and misses completely the crucial difference: immutability (= goal 1). Java’s addAll() doesn’t return a new collection, so it does not need ImmutableBuilder and variances cause less problems because there is no return type. However, Java collections are generally not immutable and thus violate goal 1. It is a completely different thing if you want mutable collections, but then you would have completely different requirements and you are not talking about Scala’s ++ operator anymore. If you want OOP-compatible, immutable, type-safe collections, you will end up with something that Scala has. That is why I consider complexity arising from goal 1 to be necessary. It’s also worth nothing that you don’t always have to understand the exact details of a method signature, and in the collections API the CanBuildFrom stuff is pretty much same everywhere.

Comparison with the Scala version

Java:

<THAT> THAT concat(ImmutableCollection<A> other, ImmutableBuilder<THAT> builder);

Scala:

def ++ [B >: A, That] (that: TraversableOnce[B])(implicit bf: CanBuildFrom[List[A], B, That]) : That

First thing you should note is that they are not actually equivalent. The Scala version does not violate any goal, but the Java version violates goals 2 and 3. Let’s look at all the type parameters we have here:

A
Element type of the first collection
That/THAT
The type that we expect to get out of the operation
B
Element type of the second collection. The definition essentially says that type B can be a supertype of A.

The next thing you should note is CanBuildFrom in the Scala version. CanBuildFrom[List[A], B, That] is a factory object that can create objects with type That, element type B from Lists of element type A. Sounds complex, but it’s not that difficult to understand. Our Java version is simpler (no A/B type parameters), but it violates our design goals. The presence of parameters B and That in the Scala version makes my previous dog/cat example possible and if you run that with Scala you will get a list of animals.

The CanBuildFrom parameter in the Scala version is implicit, meaning that it will automatically be added by the compiler to invocations based on the import scoping if possible. So 99% of the time you will not need to actually handle the parameter yourself. In the Java version you have to include the builder parameter all the time. I really have trouble understanding the fear of implicits. Many things that would be implicits in good Scala code are typically pushed to static variables or ThreadLocals in Java (this is of course not the case with the collections API).

About the syntax

Finally I must also mention that many people complain the use of operators instead of named methods. Personally I have no problem with this as long at it is not overused. The ++ operator is probably based on Haskell where list concat is done with that operator.

I’d also like to ask why don’t the complainers complain about other operators? The trivial answer is of course that most operators like +, – are based on mathematics and as such are “common knowledge”. But what about operators like modulo (% in some languages, mod in others)? What about the power operator (^, ^^ or **)? I don’t see many Python programmers crying about ** even though the syntax is not based on mathematics directly.

On the other hand, good syntax is always a matter of taste. I like the fact that for example scalaz has both named and operator versions of most important methods.

Read More