Scala Collections Tips and Tricks

Library bookshelfThis article presents a list of simplifications and optimizations of typical Scala Collections API usages.

Some of the tips rest upon subtle implementation details, though most of the recipes are just common sense transformations that, in practice, are often overlooked.

The list is inspired by my efforts to devise practical Scala Collections inspections for the IntelliJ Scala plugin. We’re now in process of implementing those inspections, so if you use the plugin in IDEA, you’ll automatically benefit from static code analysis.

Nevertheless, the recipes are valuable by themselves and can help you to deepen your understanding of Scala Collections and to make your code faster and cleaner.

Update:
If you feel adventurous, you can learn how to contribute to IntelliJ Scala plugin and try your hand at “up for grabs” collection inspections that are to your liking.

The article is also (independently) translated into Russian and Chinese (one more translation).

Contents:

  1. Legend
  2. Composition
  3. Side effects
  4. Sequences
    4.1. Creation
    4.2. Length
    4.3. Equality
    4.4. Indexing
    4.5. Existence
    4.6. Filtering
    4.7. Sorting
    4.8. Reduction
    4.9. Matching
    4.10. Rewriting
  5. Sets
  6. Options
    6.1. Value
    6.2. Null
    6.3. Processing
    6.4. Rewriting
  7. Maps
  8. Supplement

All the code examples are available as a GitHub repository.

1. Legend

To make code examples more comprehensible, I adhered to the following notation conventions:

  • seq — an instance of Seq-based collection, like Seq(1, 2, 3)
  • set — an instance of Set, for example Set(1, 2, 3)
  • array — an array, e. g. Array(1, 2, 3)
  • option — an instance of Option, for example, Some(1)
  • map — an instance of Map, like Map(1 -> "foo", 2 -> "bar")
  • ??? — an arbitrary expression
  • p — a predicate function with type T => Boolean, like _ > 2
  • n — an integer value
  • i — an integer index
  • f, g — simple functions, A => B
  • x, y — some arbitrary values
  • z — initial or default value
  • P — a pattern

2. Composition

Keep in mind that although all the recipes are isolated and self-contained, we can compose them to iteratively transform more advanced expressions, for example:

seq.filter(_ == x).headOption != None

// Via seq.filter(p).headOption -> seq.find(p)

seq.find(_ == x) != None

// Via option != None -> option.isDefined

seq.find(_ == x).isDefined

// Via seq.find(p).isDefined -> seq.exists(p)

seq.exists(_ == x)

// Via seq.exists(_ == x) -> seq.contains(x)

seq.contains(x)

So, we can rely on “substitution model of recipe application” (by analogy with SICP) to simplify complex expressions.

3. Side Effects

“Side effect” is an essential concept that should be reviewed before enumerating the actual transformations.

Basically, side effect is any action besides returning a value, that is observable outside of a function or expression scope, like:

  • input/output operation,
  • variable modification (that is accessible outside of the scope),
  • object state modification (that is observable outside of the scope),
  • throwing an exception (that is not caught within the scope).

When a function or expression contain any of these actions, they are said to have side effects, otherwise they are said to be “pure”.

Why side effects are such a big deal? Because in the presence of side effects, the order of evaluation matters. For example, here are two “pure” expressions (assigned to the corresponding values):

val x = 1 + 2
val y = 2 + 3

Because they contain no side effects (i. e. effects that are observable outside of the expressions), we may actually evaluate those expressions in any order — x then y, or y then x — it doesn’t disrupt evaluation correctness (we may even cache the result values, if we want to). Now let’s consider the following modification:

val x = { print("foo"); 1 + 2 }
val y = { print("bar"); 2 + 3 }

That’s another story — we cannot reverse the order of evaluation, because, that way, “barfoo” instead of “foobar” will be printed to console (and that is not what we expect).

So, the presence of side effects reduces the number of possible transformations (including simplifications and optimizations) that we may apply to code.

The same reasoning can be applied to collection-related expressions. Let’s imagine that we have some out-of-scope builder (with side-effecting method append):

seq.filter(x => { builder.append(x); x > 3 }).headOption

In principle, seq.filter(p).headOption construct is reducible to seq.find(p) call, however the side effect presence forbids us to do that. Here’s an attempt:

seq.find(x => { builder.append(x); x > 3 })

While those expressions are equivalent in terms of result value, they are not equivalent in regard to side effects. The former expression will append all the elements, while the latter one will omit elements after a first predicate match. So, such a simplification cannot be performed.

What can we do to make the automatic simplification possible? The answer is a golden rule that should be applied to all side effects in our code (including code with no collections at all):

  • avoid side effects, whenever possible,
  • isolate side effect from pure code, otherwise.

Thus, we need either to get rid of the builder (with its side-effecting API), or separate the call to builder from the pure expression. Let’s assume that this builder is some third-party object that we cannot get rid of, so we have to isolate the call:

seq.foreach(builder.append)
seq.filter(_ > 3).headOption

Now we can safely apply the transformation:

seq.foreach(builder.append)
seq.find(x > 3)

Nice and clean! The isolation of the side effect made automatic transformation possible. An additional benefit is that, because of the clean separation, the resulted code is easier to comprehend by humans.

Less obvious, yet very important benefit of side effect isolation, is that our code becomes more robust, irrelatively of the possible simplifications. As applied to the example, here’s why: the original expression might produce different side effects depending on the actual Seq implementation — with Vector, for example, it will append all the elements, yet with Stream it will omit elements after a first predicate match (because streams are “lazy” — elements are only evaluated when they are needed). The side effect isolation allowed us to avoid this indeterminate behavior.

4. Sequences

While tips in the current section are intended for classes that are descendants of Seq, some transformations are applicable to other collection (and non-collection) classes, like Set, Option, Map and even Iterator (because all of them provide similar interfaces with monadic methods).

4.1 Creation

Create empty collections explicitly
// Before
Seq[T]()

// After
Seq.empty[T]

Some immutable collection classes provide singleton “empty” implementations, however not all of the factory methods check length of the created collections. Thus, by making collection emptiness apparent at compile time, we could save either heap space (by reusing empty collection instances) or CPU cycles (otherwise wasted on runtime length checks).

Also applicable to: Set, Option, Map, Iterator.

4.2 Length

Prefer length to size for arrays
// Before
array.size

// After
array.length

While size and length are basically synonyms, in Scala 2.11 Array.size calls are still implemented via implicit conversion, so that intermediate wrapper objects are created for every method call. Unless you enable escape analysis in JVM , those temporary objects will burden GC and can potentially degrade code performance (especially, within loops).

Don’t negate emptiness-related properties
// Before
!seq.isEmpty
!seq.nonEmpty

// After
seq.nonEmpty
seq.isEmpty

Simple property adds less visual clutter than a compound expression.

Also applicable to: Set, Option, Map, Iterator.

Don’t compute length for emptiness check
// Before
seq.length > 0
seq.length != 0
seq.length == 0

// After
seq.nonEmpty
seq.nonEmpty
seq.isEmpty

On the one hand, simple property is more easy to perceive than compound expression. On the other hand — collections that are decedents of LinearSeq (like List) can take O(n) time to compute length (instead of O(1) time for IndexedSeq), so we can speedup our code by avoiding length computations when exact value is not really required.

Additionally, keep in mind that .length call on an infinite stream will never complete, yet we can always verify stream emptiness directly.

Also applicable to: Set, Map.

Don’t compute full length for length matching
// Before
seq.length > n
seq.length < n
seq.length == n
seq.length != n

// After
seq.lengthCompare(n) > 0
seq.lengthCompare(n) < 0
seq.lengthCompare(n) == 0
seq.lengthCompare(n) != 0

Because length calculation might be “expensive” computation for some collection classes, we can reduce comparison time from O(length) to O(length min n) for decedents of LinearSeq (which might be hidden behind Seq-typed values).

Besides, such approach is indispensable when we’re dealing with infinite streams.

Don’t use exists for emptiness check
// Before
seq.exists(_ => true)
seq.exists(const(true))

// After
seq.nonEmpty

It goes without saying that such a trick is completely redundant.

Also applicable to: Set, Option, Map, Iterator.

4.3 Equality

Don’t rely on == to compare array contents
// Before
array1 == array2

// After
array1.sameElements(array2)

Equality check always produces false for different array instances.

Also applicable to: Iterator.

Don’t check equality between collections in different categories
// Before
seq == set

// After
seq.toSet == set

Equality checks cannot be used to compare collections in different categories (i. e. List vs. Set).

Please note that you should think twice about exact meaning of such a check (in reference to the example — how to treat duplicates in the sequence).

Don’t use sameElements to compare ordinary collections
// Before
seq1.sameElements(seq2)

// After
seq1 == seq2

Equality check is the way to go when we’re comparing collections in the same category. In theory, that might improve performance because of possible underlying instance check (eq, which is usually much faster).

Don’t check equality correspondence manually
// Before
seq1.corresponds(seq2)(_ == _)

// After
seq1 == seq2

We have a built-in method that does the same. Both expressions respect order of elements. Again, we might benefit from performance improvement.

4.4 Indexing

Don’t retrieve first element by index
// Before
seq(0)

// After
seq.head

The updated approach may be slightly faster for some collection classes (check List.apply code, for example). Additionally, property access is simpler (both syntactically and semantically) than method call with an argument.

Don’t retrieve last element by index
// Before
seq(seq.length - 1)

// After
seq.last

The latter expression is more obvious and allows to avoid redundant collection length computation (that might be slow for linear sequences). Besides, some collection classes can retrieve last element more efficiently comparing to by-index access.

Don’t check index bounds explicitly
// Before
if (i < seq.length) Some(seq(i)) else None

// After
seq.lift(i)

The second expression is semantically equivalent, yet more concise.

Don’t emulate headOption
// Before
if (seq.nonEmpty) Some(seq.head) else None
seq.lift(0)

// After
seq.headOption

The optimized expression is more concise.

Don’t emulate lastOption
// Before
if (seq.nonEmpty) Some(seq.last) else None
seq.lift(seq.length - 1)

// After
seq.lastOption

The optimized expression is more concise (and potentially faster).

Be careful with indexOf and lastIndexOf argument types

// Before
Seq(1, 2, 3).indexOf("1") // compilable
Seq(1, 2, 3).lastIndexOf("2") // compilable

//  After
Seq(1, 2, 3).indexOf(1)
Seq(1, 2, 3).lastIndexOf(2)

Because of how variance works, indexOf and lastIndexOf methods accept arguments of Any type. In practice, that might lead to hard-to-find bugs, which are not discoverable at compile time. That’s where auxiliary IDE inspections come in handy.

Don’t construct indices range manually
// Before
Range(0, seq.length)

// After
seq.indices

There’s a built-in method that returns the range of all indices of a sequence.

Don’t zip collection with its indices manually
// Before
seq.zip(seq.indices)

// After
seq.zipWithIndex

For one thing, the latter expression is more concise. Besides, we may expect some performance gain, because we avoid hidden length calculation (which might be expensive for linear sequences).

Additional benefit of the latter expression is that it works well with possibly infinite collections (like Stream).

Use IndexedSeq instance as a function value
// Before (seq: IndexedSeq[T])
Seq(1, 2, 3).map(seq(_))

// After
Seq(1, 2, 3).map(seq)

Because an instance of IndexedSeq[T] is also Function1[Int, T], you can use it directly as such.

4.5 Existence

Don’t use equality predicate to check element presence
// Before
seq.exists(_ == x)

//  After
seq.contains(x)

The second expression is semantically equivalent, yet more concise.

When those expressions are applied to Set classes, performance might be different as night and day, because sets offer close to O(1) lookups (due to internal indexing, which is left unused within exists calls).

Also applicable to: Set, Option, Iterator.

Be careful with contains argument type

// Before
Seq(1, 2, 3).contains("1") // compilable

//  After
Seq(1, 2, 3).contains(1)

Just like indexOf and lastIndexOf methods, contains accepts arguments of Any type, what might lead to hard-to-find bugs, which are not discoverable at compile time. Be careful with the method arguments.

Don’t use inequality predicate to check element absence
// Before
seq.forall(_ != x)

// After
!seq.contains(x)

Again, the latter expression is cleaner and possibly faster (especially, for sets).

Also applicable to: Set, Option, Iterator.

Don’t count occurrences to check existence
// Before
seq.count(p) > 0
seq.count(p) != 0
seq.count(p) == 0

//  After
seq.exists(p)
seq.exists(p)
!seq.exists(p)

Obviously, when we need to know whether a predicate holds for some elements of the collection, counting the number of elements which satisfy the predicate is redundant. The optimized expressions looks cleaner and performs better.

The predicate p must be pure.

Also applicable to: Set, Map, Iterator.

Don’t resort to filtering to check existence
// Before
seq.filter(p).nonEmpty
seq.filter(p).isEmpty

// After
seq.exists(p)
!seq.exists(p)

The call to filter creates an intermediate collection which takes heap space and loads GC. Besides, the former expressions find all occurrences, when only the first one is needed (which might slowdown code, depending on likely collection contents).

The potential performance gain is less significant for lazy collections (like Stream and, especially, Iterator).

The predicate p must be pure.

Also applicable to: Set, Option, Map, Iterator.

Don’t resort to search to check existence
// Before
seq.find(p).isDefined
seq.find(p).isEmpty

// After
seq.exists(p)
!seq.exists(p)

Search is definitely better than filtering, yet we can still do better (at least, in terms of clarity).

Also applicable to: Set, Option, Map, Iterator.

4.6 Filtering

Don’t negate filter predicate
// Before
seq.filter(!p)

// After
seq.filterNot(p)

The latter expression is syntactically simpler (while semantically they are equal).

Also applicable to: Set, Option, Map, Iterator.

Don’t resort to filtering to count elements
// Before
seq.filter(p).length

// After
seq.count(p)

The call to filter creates an intermediate collection (which is not really needed) that takes heap space and loads GC.

Also applicable to: Set, Option, Map, Iterator.

Don’t use filtering to find first occurrence
// Before
seq.filter(p).headOption

// After
seq.find(p)

Unless seq is a lazy collection (like Stream), filtering will find all occurrences (and create a temporary collection), while only the first one is needed.

Also applicable to: Set, Option, Map, Iterator.

4.7 Sorting

Don’t sort by a property manually
// Before
seq.sortWith(_.property <  _.property)

// After
seq.sortBy(_.property)

We have a special method for that, which is more clear and concise.

Don’t sort by identity manually
// Before
seq.sortBy(identity)
seq.sortWith(_ < _)

// After
seq.sorted

There’s a special method for that, more clear and concise.

Perform reverse sorting in one step
// Before
seq.sorted.reverse
seq.sortBy(_.property).reverse
seq.sortWith(f(_, _)).reverse

// After
seq.sorted(Ordering[T].reverse)
seq.sortBy(_.property)(Ordering[T].reverse)
seq.sortWith(!f(_, _))

That’s how we can avoid a temporary collection allocation and exclude the additional transformation step (to save both heap space and CPU cycles).

Don’t use sorting to find the smallest element
// Before
seq.sorted.head
seq.sortBy(_.property).head

// After
seq.min
seq.minBy(_.property)

The latter approach is more concise. Besides, it’s more effective, because it doesn’t create a temporary collection.

Don’t use sorting to find the largest element
// Before
seq.sorted.last
seq.sortBy(_.property).last

// After
seq.max
seq.maxBy(_.property)

Rationale is the same as in the previous tip.

4.8 Reduction

Don’t calculate sum manually
// Before
seq.reduce(_ + _)
seq.fold(z)(_ + _)

// After
seq.sum
seq.sum + z

The advantage is clarity and conciseness.

Other possible methods are: reduceLeft, reduceRight, foldLeft, foldRight.

The second transformation might be replaced with the first when z is 0.

Also applicable to: Set, Iterator.

Don’t calculate product manually
// Before
seq.reduce(_ * _)
seq.fold(z)(_ * _)

// After
seq.product
seq.product * z

Rationale is the same as in the previous tip.

The second transformation might be replaced with the first when z is 1.

Also applicable to: Set, Iterator.

Don’t search for the smallest element manually
// Before
seq.reduce(_ min _)
seq.fold(z)(_ min _)

// After
seq.min
z min seq.min

Rationale is the same as in the previous tips.

Also applicable to: Set, Iterator.

Don’t search for the largest element manually
// Before
seq.reduce(_ max _)
seq.fold(z)(_ max _)

// After
seq.max
z max seq.max

Rationale is the same as in the previous tips.

Also applicable to: Set, Iterator.

Don’t emulate forall
// Before
seq.foldLeft(true)((x, y) => x && p(y))
!seq.map(p).contains(false)

// After
seq.forall(p)

The goal of the simplification is clarity and conciseness.

The predicate p must be pure.

Also applicable to: Set, Option (for the second line), Iterator.

Don’t emulate exists
// Before
seq.foldLeft(false)((x, y) => x || p(y))
seq.map(p).contains(true)

// After
seq.exists(p)

Besides clarity and conciseness, the latter expression might be faster (because it stops element processing when a single occurrence is found) and it can work with infinite sequences.

The predicate p must be pure.

Also applicable to: Set, Option (for the second line), Iterator.

Don’t emulate map
// Before
seq.foldLeft(Seq.empty)((acc, x) => acc :+ f(x))
seq.foldRight(Seq.empty)((x, acc) => f(x) +: acc)

// After
seq.map(f)

That’s a “classical” implementation of map via folding in functional programming. While it’s surely didactic, there’s no need to resort to such a formulation when we have the concise built-in method (which is also faster, because it uses a simple while loop under the hood).

Also applicable to: Set, Option, Iterator.

Don’t emulate filter
// Before
seq.foldLeft(Seq.empty)((acc, x) => if (p(x)) acc :+ x else acc)
seq.foldRight(Seq.empty)((x, acc) => if (p(x)) x +: acc else acc)

// After
seq.filter(p)

Rationale is the same as in the previous tip.

Also applicable to: Set, Option, Iterator.

Don’t emulate reverse
// Before
seq.foldLeft(Seq.empty)((acc, x) => x +: acc)
seq.foldRight(Seq.empty)((x, acc) => acc :+ x)

// After
seq.reverse

Again, the built-in method is cleaner and faster.

Also applicable to: Set, Option, Iterator.

4.9 Matching

Here are several dedicated tips that are related to Scala’s pattern matching and partial functions.

Use partial function instead of function with pattern matching
// Before
seq.map {
  _ match {
    case P => ??? // x N
  }
}

// After
seq.map {
  case P => ??? // x N
}

The updated expression produces the same result, yet looks simpler.

It’s possible to apply this transformation to any functions, not only to map arguments. This tip is not actually collection-related at all. However, because higher-order functions are so ubiquitous in Scala collection API, the tip is especially handy.

Convert flatMap with partial function to collect
// Before
seq.flatMap {
  case P => Seq(???) // x N
  case _ => Seq.empty
}

// After
seq.collect {
  case P => ??? // x N
}

The updated expression produces the same result, but looks much simpler.

Also applicable to: Set, Option, Map, Iterator.

Convert match to collect when the result is a collection
// Before
v match {
  case P => Seq(???) // x N
  case _ => Seq.empty
}

// After
Seq(v) collect {
  case P => ??? // x N
}

When all the case statements produce collections, it’s possible to simplify the expression by converting the match statement to a collect call. In that way we can write collection creation only once and we can omit the explicit default case branch.

Personally, I often use this trick with Option rather than sequences per se.

Also applicable to: Set, Option, Iterator.

Don’t emulate collectFirst
// Before
seq.collect{case P => ???}.headOption

// After
seq.collectFirst{case P => ???}

We have a special method for such a use case, which also works faster on non-lazy collections.

The partial function must be pure.

Also applicable to: Set, Map, Iterator.

4.10 Rewriting

Merge consecutive filter calls
// Before
seq.filter(p1).filter(p2)

// After
seq.filter(x => p1(x) && p2(x))

That’s how we can avoid creation of an intermediate collection (after the first filter call), and thus relieve GC’s burden.

We may also apply a generalized approach that relies on views (see below), so the result will be seq.view.filter(p1).filter(p2).force.

The predicates p1 and p2 must be pure.

Also applicable to: Set, Option, Map, Iterator.

Merge consecutive map calls
// Before
seq.map(f).map(g)

// After
seq.map(f.andThen(g))

Again, we’re creating the resulting collection directly, without an intermediate one.

We may also apply a generalized approach that relies on views (see below), so the result will be seq.view.map(f).map(g).force.

The functions f and g must be pure.

Also applicable to: Set, Option, Map, Iterator.

Do sorting after filtering
// Before
seq.sorted.filter(p)

// After
seq.filter(p).sorted

Sorting is computationally expensive procedure. There’s no need to sort elements that will be potentially filtered out in the next step.

The same applies to the other possible sorting methods, like sortWith and sortBy.

The predicate p must be pure.

Don’t reverse collection explicitly before calling map
// Before
seq.reverse.map(f)

// After
seq.reverseMap(f)

The first expression creates an intermediate (reversed) collection before transforming elements, which is quite reasonable sometimes (e.g. for List). Other times it’s possible to perform the required transformation directly, without creating the intermediate collection, which is more efficient.

Don’t reverse collection explicitly to acquire reverse iterator
// Before
seq.reverse.iterator

// After
seq.reverseIterator

Again, the latter expression is simpler and might be more efficient.

Don’t convert collection to Set to find distinct elements
// Before
seq.toSet.toSeq

// After
seq.distinct

There’s no need to create a temporary set (at least explicitly) to find distinct elements.

Don’t emulate slice
// Before
seq.drop(x).take(y)

// After
seq.slice(x, x + y)

For linear sequences all we gain is clear intent and conciseness. For indexed sequences, however, we may expect a potential performance improvements.

Also applicable to: Set, Map, Iterator.

Don’t emulate splitAt
// Before
val seq1 = seq.take(n)
val seq2 = seq.drop(n)

// After
val (seq1, seq2) = seq.splitAt(n)

The optimized expression performs faster on linear sequences (like List or Stream), because it computes the results in a single pass.

Also applicable to: Set, Map.

Don’t emulate span
// Before
val seq1 = seq.takeWhile(p)
val seq2 = seq.dropWhile(p)

// After
val (seq1, seq2) = seq.span(p)

That’s how we can traverse the sequence and check the predicate only once, instead of twice.

The predicate p must be pure.

Also applicable to: Set, Map, Iterator.

Don’t emulate partition
// Before
val seq1 = seq.filter(p)
val seq2 = seq.filterNot(p)

// After
val (seq1, seq2) = seq.partition(p)

Again, the benefit is a single-pass computation.

The predicate p must be pure.

Also applicable to: Set, Map, Iterator.

Don’t emulate takeRight
// Before
seq.reverse.take(n).reverse

// After
seq.takeRight(n)

The latter expression is more concise and potentially more efficient (for both indexed- and linear sequences).

Don’t emulate flatten
// Before (seq: Seq[Seq[T]])
seq.reduce(_ ++ _)
seq.fold(Seq.empty)(_ ++ _)
seq.flatMap(identity)

// After
seq.flatten

There’s no need to flatten collections manually when we have a dedicated built-in method for doing exactly that.

Also applicable to: Set, Option, Iterator.

Don’t emulate flatMap
// Before (f: A => Seq[B])
seq.map(f).flatten

// After
seq.flatMap(f)

Again, there’s no need to emulate built-in method manually. Besides improved clarity, we can skip the creation of an intermediate collection.

Also applicable to: Set, Option, Iterator.

Don’t use map when result is ignored
// Before
seq.map(???) // the result is ignored

// After
seq.foreach(???)

When side effect is all that is needed, there’s no justification for calling map. Such a call is misleading (and less efficient).

Also applicable to: Set, Option, Map, Iterator.

Don’t use unzip to extract a single component
// Before (seq: Seq[(A, B]])
seq.unzip._1

// After
seq.map(_._1)

There’s no need to create additional collection(s) when only a single component is required.

Other possible method: unzip3.

Also applicable to: Set, Option, Map, Iterator.

Don’t create temporary collections

This recipe is subdivided into three parts (depending on transformation final result).

1. Transformation reduces collection to a single value.

// Before
seq.map(f).flatMap(g).filter(p).reduce(???)

// After
seq.view.map(f).flatMap(g).filter(p).reduce(???)

In place of reduce might be any method that reduces collection to a single value, for example: reduceLeft, reduceRight, fold, foldLeft, foldRight, sum, product, min, max, head, headOption, last, lastOption, indexOf, lastIndexOf, find, contains, exists, count, length, mkString, etc.

The exact order of transformations is not relevant — what matters is that we’re creating one or more temporary, intermediate collections, that are not needed by themselves, yet they will take heap space and burden GC. That happens because, by default, all collection transformers (map, flatMap, filter, ++, etc.) are “strict” (except for Stream) and construct a new collection with all its elements as a result of transformation.

That’s where views come to the rescue — we may think of view as some kind of Iterator, that allows re-iteration:

  • Views are “lazy” — elements constructed only when they are needed.
  • Views don’t hold created elements in memory (which even Stream does).

To go from a collection to its view, we use the view method.

2. Transformation produces a collection of the same class.

It’s possible to use views when the final result of transformation is still a collection — the force method will build a collection of the original class (while creation of all the intermediate collections is still avoided):

// Before
seq.map(f).flatMap(g).filter(p)

// After
seq.view.map(f).flatMap(g).filter(p).force

If the only intermediate transformation is filtering, you may consider using withFilter method as an alternative:

seq.withFilter(p).map(f)

The method is originally intended to be used in for comprehensions. It works much like a view — it creates a temporary object that restricts the domain of subsequent collection transformations (so, it reorders possible side effects). However, there’s no need to explicitly convert collection to / from temporary representation (by calling view and force).

While view-based approach is more universal, in this particular case you may prefer withFilter method because of the conciseness.

3. Transformation creates a collection of different class.

// Before
seq.map(f).flatMap(g).filter(p).toList

// After
seq.view.map(f).flatMap(g).filter(p).toList

This time we use a suitable converter method (instead of the generic force call), so the result will be a collection of different class.

There also exists an alternative way of handling “transformation + conversion” case that relies on breakOut:

seq.map(f)(collection.breakOut): List[T]

Such an expression is functionally equivalent to using a view, however this approach:

  • needs expected type to be explicit (which often requires an additional type annotation),
  • is limited to a single transformation (like map, flatMap, filter, fold, etc.),
  • looks rather tricky (because implicit builders are usually omitted from the Scala Collections API docs).

Thus, it’s usually better to substitute breakOut for a view, which is more clear, more concise and more flexible.

Views are especially efficient when collection size is relatively large.

All the functions (like f and g) and predicates (p) must be pure (because view might delay, skip or reorder computations).

Also applicable to: Set, Map.

Use assignment operators to reassign a sequence
// Before
seq = seq :+ x
seq = x +: seq
seq1 = seq1 ++ seq2
seq1 = seq2 ++ seq1

// After
seq :+= x
seq +:= x
seq1 ++= seq2
seq1 ++:= seq2

Scala offers a syntactic sugar known as “assignment operators” — it automatically converts x <op>= y statements into x = x <op> y where <op> is some character operator (like +, -, etc.). Note, that if <op> ends with : it is considered right-associative (i. e. invoked on the right expression, instead of the left).

There’s also a special syntax for lists and streams:

// Before
list = x :: list
list1 = list2 ::: list1

stream = x #:: stream
stream1 = stream2 #::: stream1

// After
list ::= x
list1 :::= list2

stream #::= x
stream1 #:::= stream2

The optimized statements are more concise.

Also applicable to: Set, Map, Iterator (with operator specifics in mind).

Don’t convert collection type manually
// Before
seq.foldLeft(Set.empty)(_ + _)
seq.foldRight(List.empty)(_ :: _)

// After
seq.toSet
seq.toList

We have corresponding build-it methods for doing that, which are cleaner and faster.

If you need to transform or filter values in the course of the conversion, consider using views or similar techniques, as described above.

Also applicable to: Set, Option, Iterator.

Be careful with toSeq on non-strict collections
// Before (seq: TraversableOnce[T])
seq.toSeq

// After
seq.toStream
seq.toVector

Because Seq(...) creates a strict collection (namely, Vector), we may be tempted to use toSeq to convert a non-strict entity (like Stream, Iterator, or “view”) to a strict collection. However, TraversableOnce.toSeq returns a Stream under the hood, which is a lazy collection, and that might result in hard-to-find bugs or performance problems. Even if a stream is exactly what you expect, such an expression might confuse other people who read your code.

Here’s a typical example of the pitfall:

val source = Source.fromFile("lines.txt")
val lines = source.getLines.toSeq
source.close()
lines.foreach(println)

Such a code will throw an IOException complaining that the stream is already closed.

To clarify the intent, it’s better to write toStream explicitly, or, if we need a strict collection, after all, to use toVector instead of toSeq.

Don’t convert to String manually
// Before (seq: Seq[String])
seq.reduce(_ + _)
seq.reduce(_ + separator + _)
seq.fold(prefix)(_ + _)
seq.map(_.toString).reduce(_ + _) // seq: Seq[T]
seq.foldLeft(new StringBuilder())(_ append _)

// After
seq.mkString
seq.mkString(prefix, separator, "")

The latter approach is cleaner and potentially faster, because it uses a single StringBuilder under the hood.

Other possible methods are: reduceLeft, reduceRight, foldLeft, foldRight.

Also applicable to: Set, Option, Iterator.

5. Sets

Most of the tips for sequences are applicable to sets as well. Additionally, there are several set-specific tips available.

Don’t use sameElements to compare unordered collections
// Before
set1.sameElements(set2)

// After
set1 == set2

This rule was introduced earlier (for sequences), yet for sets the rationale is even more sound.

The sameElements method might return indeterministic results for unordered collections, because this method respects order of elements, while we cannot rely on the order in a set.

Classes that explicitly guarantee predictable iteration order (like LinkedHashSet) are exceptions from this rule.

Also applicable to: Map.

Use Set instance as a function value
// Before (set: Set[Int])
Seq(1, 2, 3).filter(set(_))
Seq(1, 2, 3).filter(set.contains)

// After
Seq(1, 2, 3).filter(set)

Because an instance of Set[T] is also Function1[T, Boolean], you can use it directly as such.

Don’t compute set intersection manually
// Before
set1.filter(set2)

// After
set1.intersect(set2) // or set1 & set2

The latter expression is more clear and concise (while performance is the same).

This transformation can be performed on sequences, however we should keep in mind, that in such a case, duplicate elements will be handled differently.

Don’t compute set difference manually
// Before
set1.filterNot(set2)

// After
set1.diff(set2) // or set1 &~ set2

Again, the updated expression is more clear and concise (while performance is the same).

The transformation is potentially applicable to sequences, though we should consider duplicate elements.

6. Options

Technically, Option is not a part of the Scala Collections, however it provides similar interface (with monadic methods, etc.) and behaves like a special kind of collection that may or may not hold some value.

Many of the tips for sequences are also applicable to options. Besides, here are dedicated tips, that are based on Option API.

6.1 Value

Don’t compare option values with None
// Before
option == None
option != None

// After
option.isEmpty
option.isDefined

While such a comparison is fully legitimate, we have a more straightforward way to check whether option is defined (by analogy with collections: we normally write seq.isEmpty, not seq == Seq.empty).

Another advantage of this simplification, is that if you ever decide to change the type from Option[T] to just T, scalac will compile the former expression (yielding only a warning), while compilation of the latter one will result in a righteous error.

One more argument in favor of the latter approach is that it helps to “catch” null values that came instead of Option values (by error).

Don’t compare option values with Some
// Before
option == Some(v)
option != Some(v)

// After
option.contains(v)
!option.contains(v)

This tip is a complement to the previous one (with the same justification).

Don’t rely on instance type to check value existence
// Before
option.isInstanceOf[Some[_]]

// After
option.isDefined

There’s simply no need for such trickery.

Don’t resort to pattern matching to check value existence
// Before
option match {
    case Some(_) => true
    case None => false
}

option match {
    case Some(_) => false
    case None => true
}

// After
option.isDefined
option.isEmpty

Again, while the former expression is technically correct, there’s no justification for such an extravagance. Besides, the simplified expression will work faster.

Also applicable to: Seq, Set.

Don’t negate value existence-related properties
// Before
!option.isEmpty
!option.isDefined
!option.nonEmpty

// After
seq.isDefined
seq.isEmpty
seq.isEmpty

Rationale is the same as the one for sequences — simple property adds less visual clutter than a compound expression.

Note that we have the synonyms: isDefined (option-specific) and nonEmpty (sequence-specific). It might be reasonable to prefer the former one, to distinguish options from sequences.

6.2 Null

Don’t compare value with null explicitly to construct an Option
// Before
if (v != null) Some(v) else None

// After
Option(v)

We have a special, concise syntax for that.

Don’t use Option(...) with a constant
// Before
Option("constant")

// After
Some("constant")

When an argument to Option(...) cannot be null, it’s better to avoid the redundant null check and to encode value existence at the type level.

Don’t provide null as an explicit alternative
// Before
option.getOrElse(null)

// After
option.orNull

We can rely on the predefine method in such a case, so the expression will become shorter.

6.3 Processing

It’s possible to single out a groups of tips that are related to how Option value is processed.

As Option‘s API documentation says that “the most idiomatic way to use an Option instance is to treat it as a collection or monad and use map, flatMap, filter, or foreach”, the basic principle here is to avoid “check & get” chains, implemented either via if statement or via pattern matching.

The goal is conciseness and robustness, the “monadic” code is:

  • more concise and intelligible,
  • safeguarded against NoSuchElementException nor MatchError exceptions at runtime.

This rationale is common for all the following cases.

Don’t emulate getOrElse
// Before
if (option.isDefined) option.get else z

option match {
  case Some(it) => it
  case None => z
}

// After
option.getOrElse(z)
Don’t emulate orElse
if (option1.isDefined) option1 else option2

option1 match {
  case Some(it) => Some(it)
  case None => option2
}

// After
option1.orElse(option2)
Don’t emulate exists
// Before
option.isDefined && p(option.get)

if (option.isDefined) p(option.get) else false

option match {
  case Some(it) => p(it)
  case None => false
}

// After
option.exists(p)
Don’t emulate forall
// Before
option.isEmpty || (option.isDefined && p(option.get))

if (option.isDefined) p(option.get) else true

option match {
  case Some(it) => p(it)
  case None => true
}

// After
option.forall(p)
Don’t emulate contains
// Before
option.isDefined && option.get == x

if (option.isDefined) option.get == x else false

option match {
  case Some(it) => it == x
  case None => false
}

// After
option.contains(x)
Don’t emulate foreach
// Before
if (option.isDefined) f(option.get)

option match {
  case Some(it) => f(it)
  case None =>
}

// After
option.foreach(f)
Don’t emulate filter
// Before
if (option.isDefined && p(option.get)) option else None

option match {
  case Some(it) && p(it) => Some(it)
  case _ => None
}

// After
option.filter(p)
Don’t emulate map
// Before
if (option.isDefined) Some(f(option.get)) else None

option match {
  case Some(it) => Some(f(it))
  case None => None
}

// After
option.map(f)
Don’t emulate flatMap
// Before (f: A => Option[B])
if (option.isDefined) f(option.get) else None

option match {
  case Some(it) => f(it)
  case None => None
}

// After
option.flatMap(f)

6.4 Rewriting

Convert map with getOrElse to fold
// Before
option.map(f).getOrElse(z)

// After
option.fold(z)(f)

Those expressions are semantically equivalent (in both cases z is computed lazily, on demand), yet the latter expression is more concise. The transformation may sometimes require an additional type annotation (because of the way how Scala’s type inference works), and, in such cases, the former expression might be preferable.

Keep in mind that this particular simplification is somehow controversial, because the latter expression looks more obscure, especially if you not accustomed to it.

Don’t emulate exists
// Before
option.map(p).getOrElse(false)

// After
option.exists(p)

We presented a rather similar rule for sequences (which is also directly applicable to options). This transformation is a special case with getOrElse call.

Don’t emulate flatten
// Before (option: Option[Option[T]])
option.map(_.get)
option.getOrElse(None)

// After
option.flatten

The latter expression looks cleaner.

Don’t convert option to sequence manually
// Before
option.map(Seq(_)).getOrElse(Seq.empty)
option.getOrElse(Seq.empty) // option: Option[Seq[T]]

// After
option.toSeq

There’s a special method for doing that, which is more concise and, less computationally-expensive.

7. Maps

Just like with other collection classes, many tips for sequences are applicable to maps as well, so we’ll enumerate only map-specific ones.

Don’t search for a value manually

// Before
map.find(_._1 == k).map(_._2)

// After
map.get(k)

While in principe the former code will work, the performance is sub-optimal, because Map is not a simple collection of (key, value) pairs — it can perform lookups in a much more efficient way.

Moreover, the latter expression is simple and easier to undestand.

Don’t use get when a raw value is needed
// Before
map.get(k).get

// After
map(k) 

There’s no need to produce an intermediate Option value when a raw value is needed.

Don’t use lift instead of get
// Before
map.lift(k)

// After
map.get(k)

There’s no need to treat map value as a partial function to acquire an optional result (which is useful for sequences), because we have a built-in method with the same functionality.

Don’t call get with getOrElse separately
// Before
map.get(k).getOrElse(z)

// After
map.getOrElse(k, z)

The single method call is simpler, both syntax- and performance-wise. In both cases z is computed lazily, on demand.

Use Map instance as a function value
// Before (map: Map[Int, T])
Seq(1, 2, 3).map(map(_))

// After
Seq(1, 2, 3).map(map)

Because an instance of Map[K, V] is also Function1[K, V], you can use it directly as such.

Don’t extract keys manually
// Before
map.map(_._1)
map.map(_._1).toSet
map.map(_._1).toIterator

// After
map.keys
map.keySet
map.keysIterator

The optimized expressions are more intelligible (and might be faster).

Don’t extract values manually
// Before
map.map(_._2)
map.map(_._2).toIterator

// After
map.values
map.valuesIterator

The simplified expressions are more comprehensible (and potentially faster).

Be careful with filterKeys
// Before
map.filterKeys(p)

// After
map.filter(p(_._1))

The filterKeys methods wraps the original map without copying any elements. There’s nothing wrong with that per se, however such a behaviour is hardly expected from filterKeys method. Because it unexpectedly behaves like a view, code performance might be substantially degraded in some cases (e. g. for filterKeys(p).groupBy(???)).

Another possible pitfall is unexpected “laziness” (collection transformers are expected to be strict by default) – the predicated is not evaluated at all withing the method call itself, so possible side effects might be reordered.

The filterKeys method should probably be deprecated, because it’s now impossible to make it strict without breaking backward compatibility. A more suitable name for the current implementation is withKeyFilter (by analogy with the withFilter method).

All in all, it seems reasonable to follow the Rule of Least Surprise and to filter keys manually.

Nevertheless, as the view-like functionality of filterKeys is potentially useful (when only a few entries will be accessed while the map is relatively large), we may still consider using the method. To keep other people who read (or modify) our code from confusion, it’s better clarify our intent by defining an appropriate method synonym:

type MapView[A, +B] = Map[A, B]

implicit class MapExt[A, +B](val map: Map[A, B]) extends AnyVal {
  def withKeyFilter(p: A => Boolean): MapView[A, B] =
    map.filterKeys(p)
}

We’re using the type alias MapView to indicate that the result map is view-like.

Another option is to define a simple helper method, like:

def get(k: T) = if (p(k)) map.get(k) else None
Be careful with mapValues
// Before
map.mapValues(f)

// After
map.map(f(_._2))

Rationale is the same as in the previous tip. In a similar way, we may define an unambiguous synonym:

type MapView[A, +B] = Map[A, B]

implicit class MapExt[A, +B](val map: Map[A, B]) extends AnyVal {
  def withValueMapper[C](f: B => C): MapView[A, C] =
    map.mapValues(f)
}

A simpler option would be to define a helper method, like:

def get(k: T) = map.get(k).map(f)
Don’t filter out keys manually
// Before
map.filterKeys(!seq.contains(_))

// After
map -- seq

We can rely on the simpler syntax to filter out keys.

Use assignment operators to reassign a map
// Before
map = map + x -> y
map1 = map1 ++ map2
map = map - x
map = map -- seq

// After
map += x -> y
map1 ++= map2
map -= x
map --= seq

As with sequences, we may rely on syntactic sugar to simplify such statements.

8. Supplement

In addition to the listed recipes, I recommend you to take a look at the Scala Collections documentation, which is surprisingly good an easy to read.

See also:

  • Scala for Project Euler — Concise functional Scala solutions to the Project Euler problems.
  • Ninety-nine — Ninety-nine problems in Scala, Java, Clojure and Haskell (with multiple solutions)

Those puzzles really helped me to shape and deepen my understanding of Scala collections.

While the list of tips is rather comprehensive, it’s probably far from complete. Moreover, because applicability of the recipes varies greatly, some fine-tuning is definitely needed. Your suggestions welcome.

32 Comments

  1. Mark T. Kennedy says:

    hi, pavel. quick question – is this fragment from your first example valid:b

    seq filter { x => builder.append(x); x > 3 } headOption

    or does it need to be something like this:

    seq filter { x => { builder.append(x); x > 3 } headOption

  2. Mark T. Kennedy says:

    sorry, that last line should read:

    seq filter { x => { builder.append(x); x > 3 } } headOption

  3. Pavel says:

    Hi Mark!

    Yes, such a code is syntactically correct, though it looks a little bit unusual as a one-liner. Here’s a more conventional formulation (we’re using line separators instead of the semicolons):

    seq.filter { x =>
        builder.append(x)
        x > 3
    }

  4. Pavel says:

    Thanks to the comments on Reddit I’ve actually inverted the recommendations on filterKeys and mapValues methods.

  5. Hugo says:

    Hi Pavel,

    Regarding “Don’t resort to map to transform values”: I think you should add the mapValues() does not create a new map, but merely a ‘view’ on the original map. This means the transformation is not executed when you map the values, but later when you access them (on the view). Sometimes this may be exactly what you want, but oftentimes it is not (and you have to resort to map()). I know I’ve been bitten by it in the past, when I assumed a new Map to be created.

    Groets, Hugo.

  6. Pavel says:

    Hi Hugo! I totally agree – I have already inverted the recommendations on filterKeys and mapValues methods (and added an explanation). Thanks for the additional notice 🙂

  7. HolyHaddock says:

    Very comprehensive!

    “Don’t retrieve (first|last) element by index” – is it worth suggesting that the total functions headOption and lastOption are generally more useful than head or last, either here or as separate points?

  8. Javin says:

    Indeed great collection of tips, I am learning Scala and this will be very handy once I get going. Thanks a lot.

  9. […] the Intro to Collections, and its sections on views and performance characteristics. Seq traits.  Collections tips and tricks. Language […]

  10. Vasiliy says:

    Great tips!

    What do you think about collection.breakOut when transforming collections?
    http://stackoverflow.com/questions/1715681/scala-2-8-breakout/1716558#1716558

  11. Pavel says:

    HolyHaddock, this list of recipes is not intended to be a general-purpose guide, so the already existing, separate tips on headOption and lastOption seem to be enough.

  12. Pavel says:

    Hi Vasiliy!

    I did consider breakOut (and even created SCL-8276) but then changed my mind. Here’s why:

    1. breakOut is too tricky, because the implicit builders are usually omitted from public API docs.

    2. breakOut is less flexible, because it needs expected type to be explicit (which often requires an additional type annotation), and because it is restricted to “single transformation / conversion” case, while view can handle arbitrary number of transformations.

    3. We have a more general mechanism – views, that are more clear and more flexible.

    The SO example can be rewritten as:

    List("London", "Paris").view.map(x => (x.length, x)).toMap

    – shorter and cleaner (and doesn’t require the additional import statement).

    I’ll probably add some note regarding the breakOut. Thank you for the notice!

  13. Vasiliy says:

    Hi Pavel,

    This answer says that in some cases breakOut can be helpful even with views: http://stackoverflow.com/questions/21285620/scala-collection-breakout-vs-views/21286268#21286268

    Also, a tip regarding options, I’ve seen this in some projects:

    option.map(set + _).getOrElse(set)

    If we treat option as a max-one-element set (or seq) then it is obviously:

    set ++ option

  14. Pavel says:

    Hi Vasiliy,

    It seems that the SO answer is not entirely correct, because collection.view.map(somefn)(breakOut) throws an exceptions (it turns out, that view and breakOut are not too compatible), and collection.view.map(someFn) results in a view that is not converted back to a collection (so those lines are not equivalent in principle).

    Here are workable and comparable expressions:

    1. seq.map(f)(breakOut): List[Int]

    2. seq.view.map(f).toList

    As far as I understand, these expressions are equivalent in terms of performance (you may check how the toList is implemented).

    There’s a tiniest overhead because of a view creation, but it is negligible in most practical uses (and perfectly acceptable, considering the advantages listed above).

    The option-related inspection looks relevant, I will, probably, append it to the list, thanks!

  15. Pavel says:

    I’ve added info on breakOut to the “Don’t create temporary collections” tip.

  16. Jacob says:

    You write:

    // Before
    seq.drop(x).take(y)

    // After
    seq.slice(x, y + 1)

    These are not equivalent; I believe perhaps you meant to write seq.slice(x, x + y)

  17. Pavel says:

    Hi Jacob!

    Indeed… fixed. That could have been a really “useful” inspection 🙂

    Thanks!

  18. Rotem says:

    Hi Pavel,
    Great post which should be on every developer desk.

    About the filterKeys and mapValues methods: You wrote in the comments that you inverted your recommendations. I didn’t saw the original recommendations so I’ll refer only to the currents:
    For both you recommend “Avoid using” with the argument that code performance might be substantially degraded in some case. It’s true, but for other cases is may result if better performances. In general, access the ‘view’ map less times that the map size, it will both perform better( and save memory).
    Though I agree that the name is misleading and I think the docs should be more explicit about it’s laziness.
    I don’t think there should be recommendation against using them, but note about it’s pitfall indeed may help.

  19. Pavel says:

    Hi Rotem,

    I usually object to compromising code clarity in favor of some fragile performance gain. However, in those cases we can actually make the best of both worlds, by defining unambiguous synonyms.

    I agree that the previous “avoid using” phrasing is a bit too categorical, so I’ve changed the titles to “be careful with” and substantially expanded the explanation.

    Thanks for the valuable suggestion!

  20. […] Scala Collections Tips and Tricks – a pretty useful (especially for novices in Scala) set of DOs and DON’Ts related to Scala collection facilities. […]

  21. David B. says:

    You should add a line about filter vs withFilter. It may not be obvious for begginners.

    // Before
    seq.filter(p1).map(p2)

    // After
    seq.withFilter(p1).map(p2)

    // Alternative for comprehension notation:
    for( e <- seq if p1 ) yield p2

  22. Pavel says:

    Hi David!

    I initially considered withFilter, however, just like with breakOut case, we have a better alternative – views, which is more clear and universal (withFilter was intended to be a part of internal for comprehensions implementation).

    So, seq.withFilter(p).map(f) can be replaced with seq.view.filter(p).map(f).force.

    Nevertheless, the withFilter method definitely worth mentioning, thanks for the tip!

  23. David B says:

    Thank you for your reply Pavel !

    I understand the analogy with breakOut but I think it doesn’t have the same constraints than it (i.e. specifiying the type of the output collection).

    For me the only difference is the syntax. And the withFilter one, even if not universal, is shorter.

    Secondly, in my comprehension of views, their usage involves the creation of an intermediate object to handle the transformations and this is not the case for the withFilter approach. It may lead to slightly better performance, even if i guess it would be very marginal in most cases.

    Thus, I think it would be great for the reader to suggest this alternative solution, as you did for breakOut.

  24. Pavel says:

    Like views, withFilter method creates an intermediate object (namely, WithFilter instance). However, there’s no need to explicitly convert collection to / from temporary representation (by calling view and force), so, in this particular case, the withFilter form is indeed much more concise.

    I’ve added an explanation of this alternative. Thanks!

  25. Philippus says:

    thank you for the tips and tricks. I found a typo:
    seq.spiltAt(n) should be seq.splitAt(n)

  26. Pavel says:

    Fixed. Thanks!

  27. PaulP says:

    Hi Pavel, don’t you think that this code sample

    `seq filter { x => builder.append(x); x > 3 } headOption`

    could be rewritten to remove postfix operator call.
    Newbies that learning language may get confused.

    Thanks, Paul

  28. Pavel says:

    Hi! Now, when postfix operators need to be explicitly enabled, this makes even more sense. I’ve updated the example. Thanks!

  29. Denis says:

    Hi 🙂

    list1 = list2 ::: list

    Shuld `list` be replaced with `list1` here?

  30. Pavel says:

    Hi Denis,

    Indeed, it should. Thanks!

  31. Philippe Derome says:

    The Option discussion on Don’t Emulate flatten has a little problem:

    The 2nd before line seems equivalent to the suggested after line, but the very first before line is problematic as it can throw exception (if option == Some(Option.empty).

    // Before (option: Option[Option[T]])
    option.map(_.get)
    option.getOrElse(None)

    // After
    option.flatten

  32. Alexander Petrossian (PAF says:

    Pavel, please reconsider diff recommendation.
    For two SortedSet-s it is currently O(n^2).
    While filterNot is much faster: O(n) + n*O(log(n)).