Scala Collections Tips and Tricks
This article presents a list of simplifications and optimizations of typical Scala Collections API usages.
Some of the tips rest upon subtle implementation details, though most of the recipes are just common sense transformations that, in practice, are often overlooked.
The list is inspired by my efforts to devise practical Scala Collections inspections for the IntelliJ Scala plugin. We’re now in process of implementing those inspections, so if you use the plugin in IDEA, you’ll automatically benefit from static code analysis.
Nevertheless, the recipes are valuable by themselves and can help you to deepen your understanding of Scala Collections and to make your code faster and cleaner.
Update:
If you feel adventurous, you can learn how to contribute to IntelliJ Scala plugin and try your hand at “up for grabs” collection inspections that are to your liking.
The article is also (independently) translated into Russian and Chinese (one more translation).
Contents:
- Legend
- Composition
- Side effects
- Sequences
4.1. Creation
4.2. Length
4.3. Equality
4.4. Indexing
4.5. Existence
4.6. Filtering
4.7. Sorting
4.8. Reduction
4.9. Matching
4.10. Rewriting - Sets
- Options
6.1. Value
6.2. Null
6.3. Processing
6.4. Rewriting - Maps
- Supplement
All the code examples are available as a GitHub repository.
1. Legend
To make code examples more comprehensible, I adhered to the following notation conventions:
seq
— an instance ofSeq
-based collection, likeSeq(1, 2, 3)
set
— an instance ofSet
, for exampleSet(1, 2, 3)
array
— an array, e. g.Array(1, 2, 3)
option
— an instance ofOption
, for example,Some(1)
map
— an instance ofMap
, likeMap(1 -> "foo", 2 -> "bar")
???
— an arbitrary expressionp
— a predicate function with typeT => Boolean
, like_ > 2
n
— an integer valuei
— an integer indexf
,g
— simple functions,A => B
x
,y
— some arbitrary valuesz
— initial or default valueP
— a pattern
2. Composition
Keep in mind that although all the recipes are isolated and self-contained, we can compose them to iteratively transform more advanced expressions, for example:
seq.filter(_ == x).headOption != None
// Via seq.filter(p).headOption -> seq.find(p)
seq.find(_ == x) != None
// Via option != None -> option.isDefined
seq.find(_ == x).isDefined
// Via seq.find(p).isDefined -> seq.exists(p)
seq.exists(_ == x)
// Via seq.exists(_ == x) -> seq.contains(x)
seq.contains(x)
So, we can rely on “substitution model of recipe application” (by analogy with SICP) to simplify complex expressions.
3. Side Effects
“Side effect” is an essential concept that should be reviewed before enumerating the actual transformations.
Basically, side effect is any action besides returning a value, that is observable outside of a function or expression scope, like:
- input/output operation,
- variable modification (that is accessible outside of the scope),
- object state modification (that is observable outside of the scope),
- throwing an exception (that is not caught within the scope).
When a function or expression contain any of these actions, they are said to have side effects, otherwise they are said to be “pure”.
Why side effects are such a big deal? Because in the presence of side effects, the order of evaluation matters. For example, here are two “pure” expressions (assigned to the corresponding values):
val x = 1 + 2
val y = 2 + 3
Because they contain no side effects (i. e. effects that are observable outside of the expressions), we may actually evaluate those expressions in any order — x
then y
, or y
then x
— it doesn’t disrupt evaluation correctness (we may even cache the result values, if we want to). Now let’s consider the following modification:
val x = { print("foo"); 1 + 2 }
val y = { print("bar"); 2 + 3 }
That’s another story — we cannot reverse the order of evaluation, because, that way, “barfoo” instead of “foobar” will be printed to console (and that is not what we expect).
So, the presence of side effects reduces the number of possible transformations (including simplifications and optimizations) that we may apply to code.
The same reasoning can be applied to collection-related expressions. Let’s imagine that we have some out-of-scope builder
(with side-effecting method append
):
seq.filter(x => { builder.append(x); x > 3 }).headOption
In principle, seq.filter(p).headOption
construct is reducible to seq.find(p)
call, however the side effect presence forbids us to do that. Here’s an attempt:
seq.find(x => { builder.append(x); x > 3 })
While those expressions are equivalent in terms of result value, they are not equivalent in regard to side effects. The former expression will append all the elements, while the latter one will omit elements after a first predicate match. So, such a simplification cannot be performed.
What can we do to make the automatic simplification possible? The answer is a golden rule that should be applied to all side effects in our code (including code with no collections at all):
- avoid side effects, whenever possible,
- isolate side effect from pure code, otherwise.
Thus, we need either to get rid of the builder
(with its side-effecting API), or separate the call to builder
from the pure expression. Let’s assume that this builder
is some third-party object that we cannot get rid of, so we have to isolate the call:
seq.foreach(builder.append)
seq.filter(_ > 3).headOption
Now we can safely apply the transformation:
seq.foreach(builder.append)
seq.find(x > 3)
Nice and clean! The isolation of the side effect made automatic transformation possible. An additional benefit is that, because of the clean separation, the resulted code is easier to comprehend by humans.
Less obvious, yet very important benefit of side effect isolation, is that our code becomes more robust, irrelatively of the possible simplifications. As applied to the example, here’s why: the original expression might produce different side effects depending on the actual Seq
implementation — with Vector
, for example, it will append all the elements, yet with Stream
it will omit elements after a first predicate match (because streams are “lazy” — elements are only evaluated when they are needed). The side effect isolation allowed us to avoid this indeterminate behavior.
4. Sequences
While tips in the current section are intended for classes that are descendants of Seq
, some transformations are applicable to other collection (and non-collection) classes, like Set
, Option
, Map
and even Iterator
(because all of them provide similar interfaces with monadic methods).
4.1 Creation
Create empty collections explicitly
// Before
Seq[T]()
// After
Seq.empty[T]
Some immutable collection classes provide singleton “empty” implementations, however not all of the factory methods check length of the created collections. Thus, by making collection emptiness apparent at compile time, we could save either heap space (by reusing empty collection instances) or CPU cycles (otherwise wasted on runtime length checks).
Also applicable to: Set
, Option
, Map
, Iterator
.
4.2 Length
Prefer length
to size
for arrays
// Before
array.size
// After
array.length
While size
and length
are basically synonyms, in Scala 2.11 Array.size
calls are still implemented via implicit conversion, so that intermediate wrapper objects are created for every method call. Unless you enable escape analysis in JVM , those temporary objects will burden GC and can potentially degrade code performance (especially, within loops).
Don’t negate emptiness-related properties
// Before
!seq.isEmpty
!seq.nonEmpty
// After
seq.nonEmpty
seq.isEmpty
Simple property adds less visual clutter than a compound expression.
Also applicable to: Set
, Option
, Map
, Iterator
.
Don’t compute length for emptiness check
// Before
seq.length > 0
seq.length != 0
seq.length == 0
// After
seq.nonEmpty
seq.nonEmpty
seq.isEmpty
On the one hand, simple property is more easy to perceive than compound expression. On the other hand — collections that are decedents of LinearSeq
(like List
) can take O(n)
time to compute length (instead of O(1)
time for IndexedSeq
), so we can speedup our code by avoiding length computations when exact value is not really required.
Additionally, keep in mind that .length
call on an infinite stream will never complete, yet we can always verify stream emptiness directly.
Also applicable to: Set
, Map
.
Don’t compute full length for length matching
// Before
seq.length > n
seq.length < n
seq.length == n
seq.length != n
// After
seq.lengthCompare(n) > 0
seq.lengthCompare(n) < 0
seq.lengthCompare(n) == 0
seq.lengthCompare(n) != 0
Because length calculation might be “expensive” computation for some collection classes, we can reduce comparison time from O(length)
to O(length min n)
for decedents of LinearSeq
(which might be hidden behind Seq
-typed values).
Besides, such approach is indispensable when we’re dealing with infinite streams.
Don’t use exists
for emptiness check
// Before
seq.exists(_ => true)
seq.exists(const(true))
// After
seq.nonEmpty
It goes without saying that such a trick is completely redundant.
Also applicable to: Set
, Option
, Map
, Iterator
.
4.3 Equality
Don’t rely on ==
to compare array contents
// Before
array1 == array2
// After
array1.sameElements(array2)
Equality check always produces false
for different array instances.
Also applicable to: Iterator
.
Don’t check equality between collections in different categories
// Before
seq == set
// After
seq.toSet == set
Equality checks cannot be used to compare collections in different categories (i. e. List
vs. Set
).
Please note that you should think twice about exact meaning of such a check (in reference to the example — how to treat duplicates in the sequence).
Don’t use sameElements
to compare ordinary collections
// Before
seq1.sameElements(seq2)
// After
seq1 == seq2
Equality check is the way to go when we’re comparing collections in the same category. In theory, that might improve performance because of possible underlying instance check (eq
, which is usually much faster).
Don’t check equality correspondence manually
// Before
seq1.corresponds(seq2)(_ == _)
// After
seq1 == seq2
We have a built-in method that does the same. Both expressions respect order of elements. Again, we might benefit from performance improvement.
4.4 Indexing
Don’t retrieve first element by index
// Before
seq(0)
// After
seq.head
The updated approach may be slightly faster for some collection classes (check List.apply
code, for example). Additionally, property access is simpler (both syntactically and semantically) than method call with an argument.
Don’t retrieve last element by index
// Before
seq(seq.length - 1)
// After
seq.last
The latter expression is more obvious and allows to avoid redundant collection length computation (that might be slow for linear sequences). Besides, some collection classes can retrieve last element more efficiently comparing to by-index access.
Don’t check index bounds explicitly
// Before
if (i < seq.length) Some(seq(i)) else None
// After
seq.lift(i)
The second expression is semantically equivalent, yet more concise.
Don’t emulate headOption
// Before
if (seq.nonEmpty) Some(seq.head) else None
seq.lift(0)
// After
seq.headOption
The optimized expression is more concise.
Don’t emulate lastOption
// Before
if (seq.nonEmpty) Some(seq.last) else None
seq.lift(seq.length - 1)
// After
seq.lastOption
The optimized expression is more concise (and potentially faster).
Be careful with indexOf
and lastIndexOf
argument types
// Before
Seq(1, 2, 3).indexOf("1") // compilable
Seq(1, 2, 3).lastIndexOf("2") // compilable
// After
Seq(1, 2, 3).indexOf(1)
Seq(1, 2, 3).lastIndexOf(2)
Because of how variance works, indexOf
and lastIndexOf
methods accept arguments of Any
type. In practice, that might lead to hard-to-find bugs, which are not discoverable at compile time. That’s where auxiliary IDE inspections come in handy.
Don’t construct indices range manually
// Before
Range(0, seq.length)
// After
seq.indices
There’s a built-in method that returns the range of all indices of a sequence.
Don’t zip collection with its indices manually
// Before
seq.zip(seq.indices)
// After
seq.zipWithIndex
For one thing, the latter expression is more concise. Besides, we may expect some performance gain, because we avoid hidden length calculation (which might be expensive for linear sequences).
Additional benefit of the latter expression is that it works well with possibly infinite collections (like Stream
).
Use IndexedSeq
instance as a function value
// Before (seq: IndexedSeq[T])
Seq(1, 2, 3).map(seq(_))
// After
Seq(1, 2, 3).map(seq)
Because an instance of IndexedSeq[T]
is also Function1[Int, T]
, you can use it directly as such.
4.5 Existence
Don’t use equality predicate to check element presence
// Before
seq.exists(_ == x)
// After
seq.contains(x)
The second expression is semantically equivalent, yet more concise.
When those expressions are applied to Set
classes, performance might be different as night and day, because sets offer close to O(1)
lookups (due to internal indexing, which is left unused within exists
calls).
Also applicable to: Set
, Option
, Iterator
.
Be careful with contains
argument type
// Before
Seq(1, 2, 3).contains("1") // compilable
// After
Seq(1, 2, 3).contains(1)
Just like indexOf
and lastIndexOf
methods, contains
accepts arguments of Any
type, what might lead to hard-to-find bugs, which are not discoverable at compile time. Be careful with the method arguments.
Don’t use inequality predicate to check element absence
// Before
seq.forall(_ != x)
// After
!seq.contains(x)
Again, the latter expression is cleaner and possibly faster (especially, for sets).
Also applicable to: Set
, Option
, Iterator
.
Don’t count occurrences to check existence
// Before
seq.count(p) > 0
seq.count(p) != 0
seq.count(p) == 0
// After
seq.exists(p)
seq.exists(p)
!seq.exists(p)
Obviously, when we need to know whether a predicate holds for some elements of the collection, counting the number of elements which satisfy the predicate is redundant. The optimized expressions looks cleaner and performs better.
The predicate p
must be pure.
Also applicable to: Set
, Map
, Iterator
.
Don’t resort to filtering to check existence
// Before
seq.filter(p).nonEmpty
seq.filter(p).isEmpty
// After
seq.exists(p)
!seq.exists(p)
The call to filter
creates an intermediate collection which takes heap space and loads GC. Besides, the former expressions find all occurrences, when only the first one is needed (which might slowdown code, depending on likely collection contents).
The potential performance gain is less significant for lazy collections (like Stream
and, especially, Iterator
).
The predicate p
must be pure.
Also applicable to: Set
, Option
, Map
, Iterator
.
Don’t resort to search to check existence
// Before
seq.find(p).isDefined
seq.find(p).isEmpty
// After
seq.exists(p)
!seq.exists(p)
Search is definitely better than filtering, yet we can still do better (at least, in terms of clarity).
Also applicable to: Set
, Option
, Map
, Iterator
.
4.6 Filtering
Don’t negate filter predicate
// Before
seq.filter(!p)
// After
seq.filterNot(p)
The latter expression is syntactically simpler (while semantically they are equal).
Also applicable to: Set
, Option
, Map
, Iterator
.
Don’t resort to filtering to count elements
// Before
seq.filter(p).length
// After
seq.count(p)
The call to filter
creates an intermediate collection (which is not really needed) that takes heap space and loads GC.
Also applicable to: Set
, Option
, Map
, Iterator
.
Don’t use filtering to find first occurrence
// Before
seq.filter(p).headOption
// After
seq.find(p)
Unless seq
is a lazy collection (like Stream
), filtering will find all occurrences (and create a temporary collection), while only the first one is needed.
Also applicable to: Set
, Option
, Map
, Iterator
.
4.7 Sorting
Don’t sort by a property manually
// Before
seq.sortWith(_.property < _.property)
// After
seq.sortBy(_.property)
We have a special method for that, which is more clear and concise.
Don’t sort by identity manually
// Before
seq.sortBy(identity)
seq.sortWith(_ < _)
// After
seq.sorted
There’s a special method for that, more clear and concise.
Perform reverse sorting in one step
// Before
seq.sorted.reverse
seq.sortBy(_.property).reverse
seq.sortWith(f(_, _)).reverse
// After
seq.sorted(Ordering[T].reverse)
seq.sortBy(_.property)(Ordering[T].reverse)
seq.sortWith(!f(_, _))
That’s how we can avoid a temporary collection allocation and exclude the additional transformation step (to save both heap space and CPU cycles).
Don’t use sorting to find the smallest element
// Before
seq.sorted.head
seq.sortBy(_.property).head
// After
seq.min
seq.minBy(_.property)
The latter approach is more concise. Besides, it’s more effective, because it doesn’t create a temporary collection.
Don’t use sorting to find the largest element
// Before
seq.sorted.last
seq.sortBy(_.property).last
// After
seq.max
seq.maxBy(_.property)
Rationale is the same as in the previous tip.
4.8 Reduction
Don’t calculate sum manually
// Before
seq.reduce(_ + _)
seq.fold(z)(_ + _)
// After
seq.sum
seq.sum + z
The advantage is clarity and conciseness.
Other possible methods are: reduceLeft
, reduceRight
, foldLeft
, foldRight
.
The second transformation might be replaced with the first when z
is 0
.
Also applicable to: Set
, Iterator
.
Don’t calculate product manually
// Before
seq.reduce(_ * _)
seq.fold(z)(_ * _)
// After
seq.product
seq.product * z
Rationale is the same as in the previous tip.
The second transformation might be replaced with the first when z
is 1
.
Also applicable to: Set
, Iterator
.
Don’t search for the smallest element manually
// Before
seq.reduce(_ min _)
seq.fold(z)(_ min _)
// After
seq.min
z min seq.min
Rationale is the same as in the previous tips.
Also applicable to: Set
, Iterator
.
Don’t search for the largest element manually
// Before
seq.reduce(_ max _)
seq.fold(z)(_ max _)
// After
seq.max
z max seq.max
Rationale is the same as in the previous tips.
Also applicable to: Set
, Iterator
.
Don’t emulate forall
// Before
seq.foldLeft(true)((x, y) => x && p(y))
!seq.map(p).contains(false)
// After
seq.forall(p)
The goal of the simplification is clarity and conciseness.
The predicate p
must be pure.
Also applicable to: Set
, Option
(for the second line), Iterator
.
Don’t emulate exists
// Before
seq.foldLeft(false)((x, y) => x || p(y))
seq.map(p).contains(true)
// After
seq.exists(p)
Besides clarity and conciseness, the latter expression might be faster (because it stops element processing when a single occurrence is found) and it can work with infinite sequences.
The predicate p
must be pure.
Also applicable to: Set
, Option
(for the second line), Iterator
.
Don’t emulate map
// Before
seq.foldLeft(Seq.empty)((acc, x) => acc :+ f(x))
seq.foldRight(Seq.empty)((x, acc) => f(x) +: acc)
// After
seq.map(f)
That’s a “classical” implementation of map
via folding in functional programming. While it’s surely didactic, there’s no need to resort to such a formulation when we have the concise built-in method (which is also faster, because it uses a simple while
loop under the hood).
Also applicable to: Set
, Option
, Iterator
.
Don’t emulate filter
// Before
seq.foldLeft(Seq.empty)((acc, x) => if (p(x)) acc :+ x else acc)
seq.foldRight(Seq.empty)((x, acc) => if (p(x)) x +: acc else acc)
// After
seq.filter(p)
Rationale is the same as in the previous tip.
Also applicable to: Set
, Option
, Iterator
.
Don’t emulate reverse
// Before
seq.foldLeft(Seq.empty)((acc, x) => x +: acc)
seq.foldRight(Seq.empty)((x, acc) => acc :+ x)
// After
seq.reverse
Again, the built-in method is cleaner and faster.
Also applicable to: Set
, Option
, Iterator
.
4.9 Matching
Here are several dedicated tips that are related to Scala’s pattern matching and partial functions.
Use partial function instead of function with pattern matching
// Before
seq.map {
_ match {
case P => ??? // x N
}
}
// After
seq.map {
case P => ??? // x N
}
The updated expression produces the same result, yet looks simpler.
It’s possible to apply this transformation to any functions, not only to map
arguments. This tip is not actually collection-related at all. However, because higher-order functions are so ubiquitous in Scala collection API, the tip is especially handy.
Convert flatMap
with partial function to collect
// Before
seq.flatMap {
case P => Seq(???) // x N
case _ => Seq.empty
}
// After
seq.collect {
case P => ??? // x N
}
The updated expression produces the same result, but looks much simpler.
Also applicable to: Set
, Option
, Map
, Iterator
.
Convert match
to collect
when the result is a collection
// Before
v match {
case P => Seq(???) // x N
case _ => Seq.empty
}
// After
Seq(v) collect {
case P => ??? // x N
}
When all the case statements produce collections, it’s possible to simplify the expression by converting the match
statement to a collect
call. In that way we can write collection creation only once and we can omit the explicit default case
branch.
Personally, I often use this trick with Option
rather than sequences per se.
Also applicable to: Set
, Option
, Iterator
.
Don’t emulate collectFirst
// Before
seq.collect{case P => ???}.headOption
// After
seq.collectFirst{case P => ???}
We have a special method for such a use case, which also works faster on non-lazy collections.
The partial function must be pure.
Also applicable to: Set
, Map
, Iterator
.
4.10 Rewriting
Merge consecutive filter
calls
// Before
seq.filter(p1).filter(p2)
// After
seq.filter(x => p1(x) && p2(x))
That’s how we can avoid creation of an intermediate collection (after the first filter call), and thus relieve GC’s burden.
We may also apply a generalized approach that relies on views (see below), so the result will be seq.view.filter(p1).filter(p2).force
.
The predicates p1
and p2
must be pure.
Also applicable to: Set
, Option
, Map
, Iterator
.
Merge consecutive map
calls
// Before
seq.map(f).map(g)
// After
seq.map(f.andThen(g))
Again, we’re creating the resulting collection directly, without an intermediate one.
We may also apply a generalized approach that relies on views (see below), so the result will be seq.view.map(f).map(g).force
.
The functions f
and g
must be pure.
Also applicable to: Set
, Option
, Map
, Iterator
.
Do sorting after filtering
// Before
seq.sorted.filter(p)
// After
seq.filter(p).sorted
Sorting is computationally expensive procedure. There’s no need to sort elements that will be potentially filtered out in the next step.
The same applies to the other possible sorting methods, like sortWith
and sortBy
.
The predicate p
must be pure.
Don’t reverse collection explicitly before calling map
// Before
seq.reverse.map(f)
// After
seq.reverseMap(f)
The first expression creates an intermediate (reversed) collection before transforming elements, which is quite reasonable sometimes (e.g. for List
). Other times it’s possible to perform the required transformation directly, without creating the intermediate collection, which is more efficient.
Don’t reverse collection explicitly to acquire reverse iterator
// Before
seq.reverse.iterator
// After
seq.reverseIterator
Again, the latter expression is simpler and might be more efficient.
Don’t convert collection to Set
to find distinct elements
// Before
seq.toSet.toSeq
// After
seq.distinct
There’s no need to create a temporary set (at least explicitly) to find distinct elements.
Don’t emulate slice
// Before
seq.drop(x).take(y)
// After
seq.slice(x, x + y)
For linear sequences all we gain is clear intent and conciseness. For indexed sequences, however, we may expect a potential performance improvements.
Also applicable to: Set
, Map
, Iterator
.
Don’t emulate splitAt
// Before
val seq1 = seq.take(n)
val seq2 = seq.drop(n)
// After
val (seq1, seq2) = seq.splitAt(n)
The optimized expression performs faster on linear sequences (like List
or Stream
), because it computes the results in a single pass.
Also applicable to: Set
, Map
.
Don’t emulate span
// Before
val seq1 = seq.takeWhile(p)
val seq2 = seq.dropWhile(p)
// After
val (seq1, seq2) = seq.span(p)
That’s how we can traverse the sequence and check the predicate only once, instead of twice.
The predicate p
must be pure.
Also applicable to: Set
, Map
, Iterator
.
Don’t emulate partition
// Before
val seq1 = seq.filter(p)
val seq2 = seq.filterNot(p)
// After
val (seq1, seq2) = seq.partition(p)
Again, the benefit is a single-pass computation.
The predicate p
must be pure.
Also applicable to: Set
, Map
, Iterator
.
Don’t emulate takeRight
// Before
seq.reverse.take(n).reverse
// After
seq.takeRight(n)
The latter expression is more concise and potentially more efficient (for both indexed- and linear sequences).
Don’t emulate flatten
// Before (seq: Seq[Seq[T]])
seq.reduce(_ ++ _)
seq.fold(Seq.empty)(_ ++ _)
seq.flatMap(identity)
// After
seq.flatten
There’s no need to flatten collections manually when we have a dedicated built-in method for doing exactly that.
Also applicable to: Set
, Option
, Iterator
.
Don’t emulate flatMap
// Before (f: A => Seq[B])
seq.map(f).flatten
// After
seq.flatMap(f)
Again, there’s no need to emulate built-in method manually. Besides improved clarity, we can skip the creation of an intermediate collection.
Also applicable to: Set
, Option
, Iterator
.
Don’t use map
when result is ignored
// Before
seq.map(???) // the result is ignored
// After
seq.foreach(???)
When side effect is all that is needed, there’s no justification for calling map
. Such a call is misleading (and less efficient).
Also applicable to: Set
, Option
, Map
, Iterator
.
Don’t use unzip
to extract a single component
// Before (seq: Seq[(A, B]])
seq.unzip._1
// After
seq.map(_._1)
There’s no need to create additional collection(s) when only a single component is required.
Other possible method: unzip3
.
Also applicable to: Set
, Option
, Map
, Iterator
.
Don’t create temporary collections
This recipe is subdivided into three parts (depending on transformation final result).
1. Transformation reduces collection to a single value.
// Before
seq.map(f).flatMap(g).filter(p).reduce(???)
// After
seq.view.map(f).flatMap(g).filter(p).reduce(???)
In place of reduce
might be any method that reduces collection to a single value, for example: reduceLeft
, reduceRight
, fold
, foldLeft
, foldRight
, sum
, product
, min
, max
, head
, headOption
, last
, lastOption
, indexOf
, lastIndexOf
, find
, contains
, exists
, count
, length
, mkString
, etc.
The exact order of transformations is not relevant — what matters is that we’re creating one or more temporary, intermediate collections, that are not needed by themselves, yet they will take heap space and burden GC. That happens because, by default, all collection transformers (map
, flatMap
, filter
, ++
, etc.) are “strict” (except for Stream
) and construct a new collection with all its elements as a result of transformation.
That’s where views come to the rescue — we may think of view as some kind of Iterator
, that allows re-iteration:
- Views are “lazy” — elements constructed only when they are needed.
- Views don’t hold created elements in memory (which even
Stream
does).
To go from a collection to its view, we use the view
method.
2. Transformation produces a collection of the same class.
It’s possible to use views when the final result of transformation is still a collection — the force
method will build a collection of the original class (while creation of all the intermediate collections is still avoided):
// Before
seq.map(f).flatMap(g).filter(p)
// After
seq.view.map(f).flatMap(g).filter(p).force
If the only intermediate transformation is filtering, you may consider using withFilter
method as an alternative:
seq.withFilter(p).map(f)
The method is originally intended to be used in for comprehensions. It works much like a view — it creates a temporary object that restricts the domain of subsequent collection transformations (so, it reorders possible side effects). However, there’s no need to explicitly convert collection to / from temporary representation (by calling view
and force
).
While view-based approach is more universal, in this particular case you may prefer withFilter
method because of the conciseness.
3. Transformation creates a collection of different class.
// Before
seq.map(f).flatMap(g).filter(p).toList
// After
seq.view.map(f).flatMap(g).filter(p).toList
This time we use a suitable converter method (instead of the generic force
call), so the result will be a collection of different class.
There also exists an alternative way of handling “transformation + conversion” case that relies on breakOut
:
seq.map(f)(collection.breakOut): List[T]
Such an expression is functionally equivalent to using a view, however this approach:
- needs expected type to be explicit (which often requires an additional type annotation),
- is limited to a single transformation (like
map
,flatMap
,filter
,fold
, etc.), - looks rather tricky (because implicit builders are usually omitted from the Scala Collections API docs).
Thus, it’s usually better to substitute breakOut
for a view, which is more clear, more concise and more flexible.
Views are especially efficient when collection size is relatively large.
All the functions (like f
and g
) and predicates (p
) must be pure (because view might delay, skip or reorder computations).
Also applicable to: Set
, Map
.
Use assignment operators to reassign a sequence
// Before
seq = seq :+ x
seq = x +: seq
seq1 = seq1 ++ seq2
seq1 = seq2 ++ seq1
// After
seq :+= x
seq +:= x
seq1 ++= seq2
seq1 ++:= seq2
Scala offers a syntactic sugar known as “assignment operators” — it automatically converts x <op>= y
statements into x = x <op> y
where <op>
is some character operator (like +
, -
, etc.). Note, that if <op>
ends with :
it is considered right-associative (i. e. invoked on the right expression, instead of the left).
There’s also a special syntax for lists and streams:
// Before
list = x :: list
list1 = list2 ::: list1
stream = x #:: stream
stream1 = stream2 #::: stream1
// After
list ::= x
list1 :::= list2
stream #::= x
stream1 #:::= stream2
The optimized statements are more concise.
Also applicable to: Set
, Map
, Iterator
(with operator specifics in mind).
Don’t convert collection type manually
// Before
seq.foldLeft(Set.empty)(_ + _)
seq.foldRight(List.empty)(_ :: _)
// After
seq.toSet
seq.toList
We have corresponding build-it methods for doing that, which are cleaner and faster.
If you need to transform or filter values in the course of the conversion, consider using views or similar techniques, as described above.
Also applicable to: Set
, Option
, Iterator
.
Be careful with toSeq
on non-strict collections
// Before (seq: TraversableOnce[T])
seq.toSeq
// After
seq.toStream
seq.toVector
Because Seq(...)
creates a strict collection (namely, Vector), we may be tempted to use toSeq
to convert a non-strict entity (like Stream
, Iterator
, or “view”) to a strict collection. However, TraversableOnce.toSeq
returns a Stream
under the hood, which is a lazy collection, and that might result in hard-to-find bugs or performance problems. Even if a stream is exactly what you expect, such an expression might confuse other people who read your code.
Here’s a typical example of the pitfall:
val source = Source.fromFile("lines.txt")
val lines = source.getLines.toSeq
source.close()
lines.foreach(println)
Such a code will throw an IOException
complaining that the stream is already closed.
To clarify the intent, it’s better to write toStream
explicitly, or, if we need a strict collection, after all, to use toVector
instead of toSeq
.
Don’t convert to String
manually
// Before (seq: Seq[String])
seq.reduce(_ + _)
seq.reduce(_ + separator + _)
seq.fold(prefix)(_ + _)
seq.map(_.toString).reduce(_ + _) // seq: Seq[T]
seq.foldLeft(new StringBuilder())(_ append _)
// After
seq.mkString
seq.mkString(prefix, separator, "")
The latter approach is cleaner and potentially faster, because it uses a single StringBuilder under the hood.
Other possible methods are: reduceLeft
, reduceRight
, foldLeft
, foldRight
.
Also applicable to: Set
, Option
, Iterator
.
5. Sets
Most of the tips for sequences are applicable to sets as well. Additionally, there are several set-specific tips available.
Don’t use sameElements
to compare unordered collections
// Before
set1.sameElements(set2)
// After
set1 == set2
This rule was introduced earlier (for sequences), yet for sets the rationale is even more sound.
The sameElements
method might return indeterministic results for unordered collections, because this method respects order of elements, while we cannot rely on the order in a set.
Classes that explicitly guarantee predictable iteration order (like LinkedHashSet
) are exceptions from this rule.
Also applicable to: Map
.
Use Set
instance as a function value
// Before (set: Set[Int])
Seq(1, 2, 3).filter(set(_))
Seq(1, 2, 3).filter(set.contains)
// After
Seq(1, 2, 3).filter(set)
Because an instance of Set[T]
is also Function1[T, Boolean]
, you can use it directly as such.
Don’t compute set intersection manually
// Before
set1.filter(set2)
// After
set1.intersect(set2) // or set1 & set2
The latter expression is more clear and concise (while performance is the same).
This transformation can be performed on sequences, however we should keep in mind, that in such a case, duplicate elements will be handled differently.
Don’t compute set difference manually
// Before
set1.filterNot(set2)
// After
set1.diff(set2) // or set1 &~ set2
Again, the updated expression is more clear and concise (while performance is the same).
The transformation is potentially applicable to sequences, though we should consider duplicate elements.
6. Options
Technically, Option
is not a part of the Scala Collections, however it provides similar interface (with monadic methods, etc.) and behaves like a special kind of collection that may or may not hold some value.
Many of the tips for sequences are also applicable to options. Besides, here are dedicated tips, that are based on Option
API.
6.1 Value
Don’t compare option values with None
// Before
option == None
option != None
// After
option.isEmpty
option.isDefined
While such a comparison is fully legitimate, we have a more straightforward way to check whether option is defined (by analogy with collections: we normally write seq.isEmpty
, not seq == Seq.empty
).
Another advantage of this simplification, is that if you ever decide to change the type from Option[T]
to just T
, scalac will compile the former expression (yielding only a warning), while compilation of the latter one will result in a righteous error.
One more argument in favor of the latter approach is that it helps to “catch” null
values that came instead of Option
values (by error).
Don’t compare option values with Some
// Before
option == Some(v)
option != Some(v)
// After
option.contains(v)
!option.contains(v)
This tip is a complement to the previous one (with the same justification).
Don’t rely on instance type to check value existence
// Before
option.isInstanceOf[Some[_]]
// After
option.isDefined
There’s simply no need for such trickery.
Don’t resort to pattern matching to check value existence
// Before
option match {
case Some(_) => true
case None => false
}
option match {
case Some(_) => false
case None => true
}
// After
option.isDefined
option.isEmpty
Again, while the former expression is technically correct, there’s no justification for such an extravagance. Besides, the simplified expression will work faster.
Also applicable to: Seq
, Set
.
Don’t negate value existence-related properties
// Before
!option.isEmpty
!option.isDefined
!option.nonEmpty
// After
seq.isDefined
seq.isEmpty
seq.isEmpty
Rationale is the same as the one for sequences — simple property adds less visual clutter than a compound expression.
Note that we have the synonyms: isDefined
(option-specific) and nonEmpty
(sequence-specific). It might be reasonable to prefer the former one, to distinguish options from sequences.
6.2 Null
Don’t compare value with null
explicitly to construct an Option
// Before
if (v != null) Some(v) else None
// After
Option(v)
We have a special, concise syntax for that.
Don’t use Option(...)
with a constant
// Before
Option("constant")
// After
Some("constant")
When an argument to Option(...)
cannot be null
, it’s better to avoid the redundant null
check and to encode value existence at the type level.
Don’t provide null
as an explicit alternative
// Before
option.getOrElse(null)
// After
option.orNull
We can rely on the predefine method in such a case, so the expression will become shorter.
6.3 Processing
It’s possible to single out a groups of tips that are related to how Option
value is processed.
As Option
‘s API documentation says that “the most idiomatic way to use an Option instance is to treat it as a collection or monad and use map, flatMap, filter, or foreach”, the basic principle here is to avoid “check & get” chains, implemented either via if
statement or via pattern matching.
The goal is conciseness and robustness, the “monadic” code is:
- more concise and intelligible,
- safeguarded against
NoSuchElementException
norMatchError
exceptions at runtime.
This rationale is common for all the following cases.
Don’t emulate getOrElse
// Before
if (option.isDefined) option.get else z
option match {
case Some(it) => it
case None => z
}
// After
option.getOrElse(z)
Don’t emulate orElse
if (option1.isDefined) option1 else option2
option1 match {
case Some(it) => Some(it)
case None => option2
}
// After
option1.orElse(option2)
Don’t emulate exists
// Before
option.isDefined && p(option.get)
if (option.isDefined) p(option.get) else false
option match {
case Some(it) => p(it)
case None => false
}
// After
option.exists(p)
Don’t emulate forall
// Before
option.isEmpty || (option.isDefined && p(option.get))
if (option.isDefined) p(option.get) else true
option match {
case Some(it) => p(it)
case None => true
}
// After
option.forall(p)
Don’t emulate contains
// Before
option.isDefined && option.get == x
if (option.isDefined) option.get == x else false
option match {
case Some(it) => it == x
case None => false
}
// After
option.contains(x)
Don’t emulate foreach
// Before
if (option.isDefined) f(option.get)
option match {
case Some(it) => f(it)
case None =>
}
// After
option.foreach(f)
Don’t emulate filter
// Before
if (option.isDefined && p(option.get)) option else None
option match {
case Some(it) && p(it) => Some(it)
case _ => None
}
// After
option.filter(p)
Don’t emulate map
// Before
if (option.isDefined) Some(f(option.get)) else None
option match {
case Some(it) => Some(f(it))
case None => None
}
// After
option.map(f)
Don’t emulate flatMap
// Before (f: A => Option[B])
if (option.isDefined) f(option.get) else None
option match {
case Some(it) => f(it)
case None => None
}
// After
option.flatMap(f)
6.4 Rewriting
Convert map
with getOrElse
to fold
// Before
option.map(f).getOrElse(z)
// After
option.fold(z)(f)
Those expressions are semantically equivalent (in both cases z
is computed lazily, on demand), yet the latter expression is more concise. The transformation may sometimes require an additional type annotation (because of the way how Scala’s type inference works), and, in such cases, the former expression might be preferable.
Keep in mind that this particular simplification is somehow controversial, because the latter expression looks more obscure, especially if you not accustomed to it.
Don’t emulate exists
// Before
option.map(p).getOrElse(false)
// After
option.exists(p)
We presented a rather similar rule for sequences (which is also directly applicable to options). This transformation is a special case with getOrElse
call.
Don’t emulate flatten
// Before (option: Option[Option[T]])
option.map(_.get)
option.getOrElse(None)
// After
option.flatten
The latter expression looks cleaner.
Don’t convert option to sequence manually
// Before
option.map(Seq(_)).getOrElse(Seq.empty)
option.getOrElse(Seq.empty) // option: Option[Seq[T]]
// After
option.toSeq
There’s a special method for doing that, which is more concise and, less computationally-expensive.
7. Maps
Just like with other collection classes, many tips for sequences are applicable to maps as well, so we’ll enumerate only map-specific ones.
Don’t search for a value manually
// Before
map.find(_._1 == k).map(_._2)
// After
map.get(k)
While in principe the former code will work, the performance is sub-optimal, because Map
is not a simple collection of (key, value) pairs — it can perform lookups in a much more efficient way.
Moreover, the latter expression is simple and easier to undestand.
Don’t use get
when a raw value is needed
// Before
map.get(k).get
// After
map(k)
There’s no need to produce an intermediate Option
value when a raw value is needed.
Don’t use lift
instead of get
// Before
map.lift(k)
// After
map.get(k)
There’s no need to treat map value as a partial function to acquire an optional result (which is useful for sequences), because we have a built-in method with the same functionality.
Don’t call get
with getOrElse
separately
// Before
map.get(k).getOrElse(z)
// After
map.getOrElse(k, z)
The single method call is simpler, both syntax- and performance-wise. In both cases z
is computed lazily, on demand.
Use Map
instance as a function value
// Before (map: Map[Int, T])
Seq(1, 2, 3).map(map(_))
// After
Seq(1, 2, 3).map(map)
Because an instance of Map[K, V] is also Function1[K, V], you can use it directly as such.
Don’t extract keys manually
// Before
map.map(_._1)
map.map(_._1).toSet
map.map(_._1).toIterator
// After
map.keys
map.keySet
map.keysIterator
The optimized expressions are more intelligible (and might be faster).
Don’t extract values manually
// Before
map.map(_._2)
map.map(_._2).toIterator
// After
map.values
map.valuesIterator
The simplified expressions are more comprehensible (and potentially faster).
Be careful with filterKeys
// Before
map.filterKeys(p)
// After
map.filter(p(_._1))
The filterKeys
methods wraps the original map without copying any elements. There’s nothing wrong with that per se, however such a behaviour is hardly expected from filterKeys
method. Because it unexpectedly behaves like a view, code performance might be substantially degraded in some cases (e. g. for filterKeys(p).groupBy(???)
).
Another possible pitfall is unexpected “laziness” (collection transformers are expected to be strict by default) – the predicated is not evaluated at all withing the method call itself, so possible side effects might be reordered.
The filterKeys
method should probably be deprecated, because it’s now impossible to make it strict without breaking backward compatibility. A more suitable name for the current implementation is withKeyFilter
(by analogy with the withFilter
method).
All in all, it seems reasonable to follow the Rule of Least Surprise and to filter keys manually.
Nevertheless, as the view-like functionality of filterKeys
is potentially useful (when only a few entries will be accessed while the map is relatively large), we may still consider using the method. To keep other people who read (or modify) our code from confusion, it’s better clarify our intent by defining an appropriate method synonym:
type MapView[A, +B] = Map[A, B]
implicit class MapExt[A, +B](val map: Map[A, B]) extends AnyVal {
def withKeyFilter(p: A => Boolean): MapView[A, B] =
map.filterKeys(p)
}
We’re using the type alias MapView
to indicate that the result map is view-like.
Another option is to define a simple helper method, like:
def get(k: T) = if (p(k)) map.get(k) else None
Be careful with mapValues
// Before
map.mapValues(f)
// After
map.map(f(_._2))
Rationale is the same as in the previous tip. In a similar way, we may define an unambiguous synonym:
type MapView[A, +B] = Map[A, B]
implicit class MapExt[A, +B](val map: Map[A, B]) extends AnyVal {
def withValueMapper[C](f: B => C): MapView[A, C] =
map.mapValues(f)
}
A simpler option would be to define a helper method, like:
def get(k: T) = map.get(k).map(f)
Don’t filter out keys manually
// Before
map.filterKeys(!seq.contains(_))
// After
map -- seq
We can rely on the simpler syntax to filter out keys.
Use assignment operators to reassign a map
// Before
map = map + x -> y
map1 = map1 ++ map2
map = map - x
map = map -- seq
// After
map += x -> y
map1 ++= map2
map -= x
map --= seq
As with sequences, we may rely on syntactic sugar to simplify such statements.
8. Supplement
In addition to the listed recipes, I recommend you to take a look at the Scala Collections documentation, which is surprisingly good an easy to read.
See also:
- Scala for Project Euler — Concise functional Scala solutions to the Project Euler problems.
- Ninety-nine — Ninety-nine problems in Scala, Java, Clojure and Haskell (with multiple solutions)
Those puzzles really helped me to shape and deepen my understanding of Scala collections.
While the list of tips is rather comprehensive, it’s probably far from complete. Moreover, because applicability of the recipes varies greatly, some fine-tuning is definitely needed. Your suggestions welcome.
hi, pavel. quick question – is this fragment from your first example valid:b
seq filter { x => builder.append(x); x > 3 } headOption
or does it need to be something like this:
seq filter { x => { builder.append(x); x > 3 } headOption
sorry, that last line should read:
seq filter { x => { builder.append(x); x > 3 } } headOption
Hi Mark!
Yes, such a code is syntactically correct, though it looks a little bit unusual as a one-liner. Here’s a more conventional formulation (we’re using line separators instead of the semicolons):
seq.filter { x =>
builder.append(x)
x > 3
}
Thanks to the comments on Reddit I’ve actually inverted the recommendations on
filterKeys
andmapValues
methods.Hi Pavel,
Regarding “Don’t resort to map to transform values”: I think you should add the mapValues() does not create a new map, but merely a ‘view’ on the original map. This means the transformation is not executed when you map the values, but later when you access them (on the view). Sometimes this may be exactly what you want, but oftentimes it is not (and you have to resort to map()). I know I’ve been bitten by it in the past, when I assumed a new Map to be created.
Groets, Hugo.
Hi Hugo! I totally agree – I have already inverted the recommendations on
filterKeys
andmapValues
methods (and added an explanation). Thanks for the additional notice 🙂Very comprehensive!
“Don’t retrieve (first|last) element by index” – is it worth suggesting that the total functions headOption and lastOption are generally more useful than head or last, either here or as separate points?
Indeed great collection of tips, I am learning Scala and this will be very handy once I get going. Thanks a lot.
[…] the Intro to Collections, and its sections on views and performance characteristics. Seq traits.  Collections tips and tricks. Language […]
Great tips!
What do you think about collection.breakOut when transforming collections?
http://stackoverflow.com/questions/1715681/scala-2-8-breakout/1716558#1716558
HolyHaddock, this list of recipes is not intended to be a general-purpose guide, so the already existing, separate tips on
headOption
andlastOption
seem to be enough.Hi Vasiliy!
I did consider
breakOut
(and even created SCL-8276) but then changed my mind. Here’s why:1.
breakOut
is too tricky, because the implicit builders are usually omitted from public API docs.2.
breakOut
is less flexible, because it needs expected type to be explicit (which often requires an additional type annotation), and because it is restricted to “single transformation / conversion” case, while view can handle arbitrary number of transformations.3. We have a more general mechanism – views, that are more clear and more flexible.
The SO example can be rewritten as:
List("London", "Paris").view.map(x => (x.length, x)).toMap
– shorter and cleaner (and doesn’t require the additional import statement).
I’ll probably add some note regarding the
breakOut
. Thank you for the notice!Hi Pavel,
This answer says that in some cases breakOut can be helpful even with views: http://stackoverflow.com/questions/21285620/scala-collection-breakout-vs-views/21286268#21286268
Also, a tip regarding options, I’ve seen this in some projects:
option.map(set + _).getOrElse(set)
If we treat option as a max-one-element set (or seq) then it is obviously:
set ++ option
Hi Vasiliy,
It seems that the SO answer is not entirely correct, because
collection.view.map(somefn)(breakOut)
throws an exceptions (it turns out, thatview
andbreakOut
are not too compatible), andcollection.view.map(someFn)
results in a view that is not converted back to a collection (so those lines are not equivalent in principle).Here are workable and comparable expressions:
1.
seq.map(f)(breakOut): List[Int]
2.
seq.view.map(f).toList
As far as I understand, these expressions are equivalent in terms of performance (you may check how the
toList
is implemented).There’s a tiniest overhead because of a view creation, but it is negligible in most practical uses (and perfectly acceptable, considering the advantages listed above).
The option-related inspection looks relevant, I will, probably, append it to the list, thanks!
I’ve added info on
breakOut
to the “Don’t create temporary collections” tip.You write:
// Before
seq.drop(x).take(y)
// After
seq.slice(x, y + 1)
These are not equivalent; I believe perhaps you meant to write
seq.slice(x, x + y)
Hi Jacob!
Indeed… fixed. That could have been a really “useful” inspection 🙂
Thanks!
Hi Pavel,
Great post which should be on every developer desk.
About the filterKeys and mapValues methods: You wrote in the comments that you inverted your recommendations. I didn’t saw the original recommendations so I’ll refer only to the currents:
For both you recommend “Avoid using” with the argument that code performance might be substantially degraded in some case. It’s true, but for other cases is may result if better performances. In general, access the ‘view’ map less times that the map size, it will both perform better( and save memory).
Though I agree that the name is misleading and I think the docs should be more explicit about it’s laziness.
I don’t think there should be recommendation against using them, but note about it’s pitfall indeed may help.
Hi Rotem,
I usually object to compromising code clarity in favor of some fragile performance gain. However, in those cases we can actually make the best of both worlds, by defining unambiguous synonyms.
I agree that the previous “avoid using” phrasing is a bit too categorical, so I’ve changed the titles to “be careful with” and substantially expanded the explanation.
Thanks for the valuable suggestion!
[…] Scala Collections Tips and Tricks – a pretty useful (especially for novices in Scala) set of DOs and DON’Ts related to Scala collection facilities. […]
You should add a line about filter vs withFilter. It may not be obvious for begginners.
// Before
seq.filter(p1).map(p2)
// After
seq.withFilter(p1).map(p2)
// Alternative for comprehension notation:
for( e <- seq if p1 ) yield p2
Hi David!
I initially considered
withFilter
, however, just like withbreakOut
case, we have a better alternative – views, which is more clear and universal (withFilter
was intended to be a part of internal for comprehensions implementation).So,
seq.withFilter(p).map(f)
can be replaced withseq.view.filter(p).map(f).force
.Nevertheless, the
withFilter
method definitely worth mentioning, thanks for the tip!Thank you for your reply Pavel !
I understand the analogy with breakOut but I think it doesn’t have the same constraints than it (i.e. specifiying the type of the output collection).
For me the only difference is the syntax. And the withFilter one, even if not universal, is shorter.
Secondly, in my comprehension of views, their usage involves the creation of an intermediate object to handle the transformations and this is not the case for the withFilter approach. It may lead to slightly better performance, even if i guess it would be very marginal in most cases.
Thus, I think it would be great for the reader to suggest this alternative solution, as you did for breakOut.
Like views,
withFilter
method creates an intermediate object (namely,WithFilter
instance). However, there’s no need to explicitly convert collection to / from temporary representation (by callingview
andforce
), so, in this particular case, the withFilter form is indeed much more concise.I’ve added an explanation of this alternative. Thanks!
thank you for the tips and tricks. I found a typo:
seq.spiltAt(n) should be seq.splitAt(n)
Fixed. Thanks!
Hi Pavel, don’t you think that this code sample
`seq filter { x => builder.append(x); x > 3 } headOption`
could be rewritten to remove postfix operator call.
Newbies that learning language may get confused.
Thanks, Paul
Hi! Now, when postfix operators need to be explicitly enabled, this makes even more sense. I’ve updated the example. Thanks!
Hi 🙂
list1 = list2 ::: list
Shuld `list` be replaced with `list1` here?
Hi Denis,
Indeed, it should. Thanks!
The Option discussion on Don’t Emulate flatten has a little problem:
The 2nd before line seems equivalent to the suggested after line, but the very first before line is problematic as it can throw exception (if option == Some(Option.empty).
// Before (option: Option[Option[T]])
option.map(_.get)
option.getOrElse(None)
// After
option.flatten
Pavel, please reconsider diff recommendation.
For two SortedSet-s it is currently O(n^2).
While filterNot is much faster: O(n) + n*O(log(n)).