Skip to main content

Nuances of the Java 5.0 for-each Loop

November 7, 2006

alt="{cs.r.title}" border="0" align="left" hspace="10" vspace="0">






This article is an in-depth examination of one of the simplest but
most pleasing features in Java 5.0--the for-each loop. I
present eleven short items discussing various nuances of usage,
pitfalls to be aware of, and possible optimizations surrounding the
use of the for-each loop. In the first section, I discuss what
kind of iterations are possible with the for-each. The next section
illustrates common programming errors in using the for-each loop.
The final section shows how to write new classes that can be used
as targets of a for-each loop. I also talk about advanced
implementations that allow multiple iterable views; lazily
construct objects just in time for iteration; and enable possible
generic algorithm and compiler optimizations of the for-each
loop.

The for-each Loop: What Is It? What Can It Do? What Can't
It Do?

Sun documents sometimes refer to for-each as the "enhanced for
loop," calling the old version the "basic for loop." Designed
specifically for iterating over arrays (and collections), for-each
eliminates much of the clutter surrounding the traditional
for or while loop involving the setup and
management of an index variable or iterators. Let us look at an
example:

[prettify]public Integer sum(List<Integer> liszt) {
int total = 0;
for(int num : liszt) {
    total += num;
}
return total;
[/prettify]

The official guides suggest that the colon (:) be read as "in"--the above listing would read as "for [each] num in liszt do
total += num." By not introducing an appropriate new
keyword (such as foreach) into the language, the
designers preserved strict backward compatibility with any pre-Java
5.0 code that may have used the keyword as an identifier.

Item 1: The for-each loop allows only one iteration
variable, which precludes simultaneously iterating over two or more
arrays or collections.
For example, in computing the dot
product of two vectors, I need to multiply together the
corresponding elements of both vectors, which needs two iterators.
This cannot be coded using the "enhanced for" loop.

[prettify]
public Integer dotproduct(List<Integer> u, List<Integer> v) {
     assert  (u.size() == v.size());
     int product = 0;
     for(Iterator<Integer> u_it = u.iterator(), v_it = v.iterator();
                  u_it.hasNext()  && v_it.hasNext();
                  product += u_it.next() * v_it.next())
           ; // no body
     return product;
} 
[/prettify]

Item 2. Nested iteration is possible. Nested
iteration is different from simultaneous iteration: the inner loop
is finished before each outer iteration. The for-each loop easily
accommodates this. For instance, in the product of two matrices,
the element at the location [i,j] is the dot product
of the ith row and jth column of the
multiplied matrices:

[prettify]public static void matrixProduct(Matrix x, Matrix y, Matrix product) {
   assert  (x.rows().size() == y.cols().size() &&
            y.rows().size() == x.cols().size());

   int rNum = 0;
   for (List<Integer> xRow : x.rows()) {
       int cNum = 0;
       for (List<Integer> yCol : y.cols()) {
           product.setElement(rNum, cNum, dotProduct(xRow,yCol));
           cNum++;
       }
       rNum++;
   }
}

// where the matrix is defined as...
public interface Matrix {
    public List<List<Integer>> rows();
    public List<List<Integer>> cols();
    public void setElement(int rowNum, int colNum, Integer value);
    // ... other methods
}
[/prettify]

The Equivalent Basic for Loop and Its Implications on for-each
Usage

In effect, the for-each loop is nothing more than syntactic
sugar. It makes the code much easier to read, but provides no new
core functionality. In fact, the Java Language Specification
"http://java.sun.com/docs/books/jls/third_edition/html/statements.html#14.14.2">
describes
the meaning of the new for-each loop in terms of an
equivalent basic for loop. Essentially, an intermediate
Iterator of the right kind is created, and it is
guaranteed to be an identifier different from all other user
variables in the scope. Below, I show the equivalent translation
for the first example. To show how much clutter is effectively
removed by the new syntax, the parts actually typed in by the
programmer in that example are shown in bold:

[prettify]
public Integer sum (List<Integer> liszt) {
    int total=0;
    for(Iterator<Integer> iter=liszt.iterator(); iter.hasNext();) {
        int num = ((Integer)iter.next()).intValue();
        total += num;
    }
    return new Integer(total);
}
[/prettify]

I now examine the deeper implications of this equivalence.

Item 3: Beware of auto-unboxing. In the
previous example, the Integer item returned by
iter.next() has to be put into an int
variable. This involves "http://java.sun.com/j2se/1.5.0/docs/guide/language/autoboxing.html">
auto-unboxing
, which has been shown through the call to
Integer.intValue(). This can have an effect on the
performance of large iterations. Unless you are forced to use
native types because of their support for arithmetic operations, it
is preferable to use subclasses of Number
(Integer, Double, etc.) in loop variables
rather than the equivalent native types (int,
double, etc.).

An exception to this rule is when the iteration is around an array
of a native numeric type. For example, if liszt was an
int [], there would be no auto-unboxing involved.

Item 4: Watch out for null pointers. The first
example never explicitly de-references liszt (by, for
example, calling a method such as liszt.add()). Yet,
sum() can raise a NullPointerException if
the liszt passed in is null. This
particularly insidious error happens because of the hidden
compiler-generated call to get the iterator. To make the code
robust, I need to preface the for-each loop with a check like:

if (null != liszt) for(int num : liszt)//for loop
body...
.

Item 5. Iterating over varargs. Java 5.0 now
allows a "http://www.onjava.com/pub/a/onjava/excerpt/javaadn_chap5/index.html">
variable number of arguments
of a single type to be passed in
as the last parameter to a method. The compiler collects these
varargs parameters into an array of that type. Typically, the body
of the method treats the varargs as an array and iterates over them
using a for-each loop.

[prettify]public Integer sum(Integer... intArray) {
    int total = 0;
    for (int num : intArray) {
        total += num;
    }
    return total;
}
[/prettify]

If no vararg argument is specified in calling the method (e.g.,
a call like sum()), the compiler replaces the call
with a zero-length array (sum(new Integer[0])). Thus,
in the body of the method, I may safely drop the check for a null
array argument.

Note that if a null is explicitly passed in without
a cast (i.e., sum(null)), the compiler issues a warning
as it cannot disambiguate whether the intended argument is the null
array (Integer [])null, which would cause a
NullPointerException as warned in Item 4 above, or a
vararg argument(Integer)null, which safely gets
converted to the single item array argument
{(Integer)null}.

Item 6. Return zero length arrays or empty lists rather
than nulls
. Client code will look cleaner because the
check for whether the returned value was null can be
safely omitted. If all API code followed this rule, rule 4 would
become nearly obsolete! In Item 27 of "http://java.sun.com/docs/books/effective/">Effective
Java
, Joshua Bloch points out that returning nulls for
special cases involves writing the same amount of code as returning
a zero length array. One argument for returning nulls is that it
does not involve needless object creation. While the effect of
object creation on the performance of an average method is
questionable, I could avoid the issue altogether by returning the
same immutable copy each time the method is called. After all,
immutability guarantees safety in shareability (Item 13 of
Effective Java). Empty arrays are immutable and immutable empty
lists of any type can be obtained by calling the generic method
java.util.Collections.emptyList(). The code in bold
below shows how to do this:

[prettify]public class ArrayBackedMatrix implements Matrix {
    private Integer[][] array;
    public List<List<Integer>> rows() {
        if (array.length == 0) {
            return Collections.emptyList(); // rather than return null;
        }
        List<List<Integer>> retList = new ArrayList<List<Integer>>();
        for (Integer []row : array) {
            retList.add(Arrays.asList(row));
        }
        return retList;
    }
//... other methods
}
[/prettify]

Item 7. Do not modify the list during
iteration.
While for-each syntax does not provide direct
access to the iterator used by the equivalent basic for loop, the
list can be modified by directly calling other methods on the list.
Doing so can lead to indeterminate program behavior. In particular,
if the compiler-inserted call to iterator() returns a
fail-fast iterator, a
java.util.ConcurrentModificationException runtime
exception may be thrown. But this is only done a best-effort basis,
and cannot be relied upon except as a means of detecting a bug when
the exception does get thrown. On my JVM, the following behavior
was seen: in a list of integers, removing the second-to-last
element did not cause an exception, but removing an earlier element
did:

[prettify]List<Integer> liszt = new ArrayList<Integer>();
liszt.add(0);liszt.add(1);liszt.add(2);liszt.add(3);liszt.add(4);
 // liszt.add(5); Uncomment to see ConcurrentModificationException

for(int item: liszt) { // the iterator returned by java.util.ArrayList is fail-fast
   if(item==3)
      liszt.remove(new Integer(item)); //after this, loop behavior is indeterminate
   else System.out.println(item);
}
[/prettify]

You Don't Need to Return java.util.List!

The examples we have seen thus far would suggest that iteration
using for-each requires using instances of
java.util.List. But observing the JLS-specified
equivalent basic for loop from the fourth listing closely, we can
see that the only List method invoked is the
iterator() method. In fact, the Java Language
Specification only requires an array or a class that implements the
java.lang.Iterable interface. This interface defines a
single method, iterator(), which returns the
java.util.Iterator that the for-each loop uses. Thus,
any class can be made the target of for-each loops simply by
defining an iterator with next() and
hasNext() methods. (Iterator.remove() is
never called in the for-each, so a bare-bones iterator
implementation of remove() can throw an
UnsupportedOperationException.)

Item 8. Consider returning an Iterable
rather than a List.
Even if you are not
defining a custom iterator, if a method in your API is returning a
List that is expected to be solely used in for-each
loops, consider changing the return type to
java.lang.Iterable instead. This effectively hides the
details of the current implementation and allows for later
optimizations such as lazy construction of each iterated item (Item
10).

Item 9. Consider returning Iterable views rather than
implementing Iterable.
The Java collections
classes can be used in for-each loops because they implement
java.lang.Iterable. If a class implements
java.lang.Iterable, then all its subclasses inherit
its iterator() method. Overriding subclasses are still
expected to follow the semantics of the parent class. This may be
acceptable in most cases, but the issue can be avoided by creating
a method to return an Iterable view of the class. Subclasses can
then implement Iterable and return one of the iterable
views in the body of the iterator() method. This has
the added advantage that multiple iterable views can be defined,
with different sort orders. Some iterable views can even return
relevant subsets of the collection being iterated over.

In the example below, if City itself implements
Iterable, a for-each loop using the City
class would look like

for (Resident resident: city)
{//...}.
If City does not implement
Iterable, clients need to make an explicit call to
obtain one of the Iterable views of the City. The code
still reads just as naturally as before; e.g.,
for(Resident
resident: city.residents ()) {//...}
.

[prettify]public interface City //extends Iterable<Resident>
{
    // By extending Iterable, only one Iterable view can be supported
    // Iterator<Resident> iterator();

    // the default iterable view,
    // which could be returned by iterator() of most subclasses;
    Iterable<Resident> residents();

    // return same set of objects as residents(), but different sort order
    Iterable<Resident> residentsInAlphaOrder();

    // return a subset of residents() who satisfy an age barrier test
    Iterable<Resident> adultResidents();
}
[/prettify]

Item 10. Consider lazy construction of iterated
items
. Many times, you may need to iterate over large
lists of items that cannot all be accommodated in memory; for
instance, when iterating over large ResultSets
returned from a JDBC query. At other times, it may be expensive to
compute each object in the list. A very useful technique in such
cases is to return an iterator and construct each item only when
required.

The example below shows a generic class that can wrap any
ResultSet and make it the target of a for-each loop.
It uses a callback factory method to create Java objects from each
row of the ResultSet. Note that only one
ResultSet row needs to be in memory at any point in
the iteration. The example also shows how to use the wrapper
with a Person class whose two fields are read from the
columns of the ResultSet rows.

[prettify]public static class IterableResultSetWrapper<T>
    implements Iterable<T>{
    public static interface ResultSetRowReader<T> {
        // construct a T from a ResultSet row
        T create(ResultSet resultRow) throws SQLException;
    }
    public IterableResultSetWrapper(ResultSet results,
                      ResultSetRowReader<T> rowFactory) {
        this.results = results; this.rowFactory = rowFactory;
    }

    public Iterator<T> iterator() {
        return new Iterator<T>() {
            public boolean hasNext() {
               try {return !results.isAfterLast();}
               catch (SQLException e) {return false;}
            }
            public T next() {
                try {
                   results.next();
                   return rowFactory.create(results);
                }catch (SQLException e) {e.printStackTrace();return null;}
            }
            public void remove() {
                throw new UnsupportedOperationException();
            }
        }; // end new Iterator
    }
    private ResultSet results;
    private ResultSetRowReader<T> rowFactory;
}

// Test code to read Persons from ResultSets
public class Test {
  public static class Person {
    public Person(String name, int age) { //...}
    public static
     IterableResultSetWrapper.ResultSetRowReader<Person> resultSetReader
         = new IterableResultSetWrapper.ResultSetRowReader<Person>() {
           public Person create(ResultSet resultRow) throws SQLException {
               // create a Person by reading the
               // first and second cols of ResultSet
               return new Person(resultRow.getString(1),
                                 resultRow.getInt(2));
           }
       }; // end new IterableResultSetWrapper
  }

  public static void personForEachExample(ResultSet results) {
      Iterable iterablePersonResultSet =
           new IterableResultSetWrapper<Person> (
               results, Person.resultSetReader);
      for(Person person: iterablePersonResultSet) {
          // process person object
      }
  }
}
[/prettify]

Sometimes, legacy APIs might have a return type of
List. The above lazy construction strategy can still
be used in such cases, by returning a custom subtype of
java.util.AbstractSequentialList, which provides
implementations for all other List operations on top of a
java.util.ListIterator. The subtype can then implement
a ListIterator that constructs each item lazily as
above.

Item 11. When appropriate, implement
java.util.RandomAccess to allow for compiler
optimizations.
This rule applies to custom implementations
of the List interface that could be the targets of
for-each loop. The language specification only stipulates that
for-each should behave like its equivalent basic for loop.
The compiler could, for example, translate the for-each from the
first example as below:

[prettify]public int sum(List<Integer> liszt) {
    int total = 0;
    for(int i=0; i < liszt.size(); i++) {
        total += liszt.get(i).intValue();
    }
    return total;
}
[/prettify]

If, for typical instances of a particular List implementation, the above loop would work faster than the JLS-specified equivalent basic for loop shown in the fourth code
listing, then that List should implement
java.util.RandomAccess. This is just a marker
interface--it declares no methods--and is meant to provide a
means for generic methods to optimize their for loops involving
Lists if the List implementation's random access mechanisms are
faster than iteration. For example, in the Java Collections
framework, the ArrayList, Stack, and
Vector implement RandomAccess. A compiler
could produce the following optimized translation of the first
example:

[prettify]public int sum(List<Integer> liszt) {
    int total = 0;
    if ( liszt instanceof RandomAccess)
        for (int i=0; i < liszt.size(); i++) {
            total += liszt.get(i).intValue();
        }
    else
        for (Iterator<Integer> iter=liszt.iterator(); iter.hasNext();) {
            int num = ((Integer)iter.next()).intValue();
            total += num;
        }
    return total;
}
[/prettify]

Thus the language designers' choice of hiding the
Iterator in the for-each provides unexpected
opportunities for compiler optimization. This is also an argument
for using for-each instead of the explicit iterator: the compiler
cannot replace an explicit method call to
iterator.next(), as the programmer might have
purposely used that method in order to cause a known side effect
(say, a println() in the body of next()).
If the compiler optimized the iterator loop and replaced it with
its equivalent random access loop, the println() would
never be seen. Thus the optimization can only be performed if the
iterator is not explicitly called; i.e., if for-each is used
instead, then the compiler is free to use any for-each
implementation in keeping with the language specification.

Conclusion

In this article, I looked at various nuances of the enhanced for
loop introduced in Java 5.0. I showed that nested iteration was
possible, but simultaneous iteration over multiple collections is
not supported by this syntax (Items 1 and 2). I discussed possible
dangers to be aware of, such as auto-unboxing, null pointers in the
for-each loop, and concurrent modification of the iterated
collection (Items 3-7). I then illustrated how to create custom
classes that can be used in for-each loops, and presented
strategies for creating multiple iterable views (Item 9) and
optimizations for writing Iterable classes (lazy construction, Item
10, and RandomAccess for generic algorithms and compiler
optimizations, Item 11).

Resources


cellpadding="0" cellspacing="0" width="100%" bgcolor="#000000">

src="http://today.java.net/im/a.gif"
width="1" height="1" border="0" alt=" "

/>

Nishanth Sastry is a software developer at IBM Research.
Related Topics >> Programming   |