Scrolling with pleasure

Mouse with a smileIn this article I explain what it takes to implementing high-quality smooth / high-precision scrolling on modern computers and why some systems have it while others don’t. Most of the insights came from my work on implementing “true smooth scrolling” in IntelliJ IDEA.

The article complements my previous article “Typing with pleasure”, just like scrolling naturally complements typing in the day-to-day activity of most computer users. Nowadays, some interfaces are based solely on scrolling! Because scrolling is so ubiquitous, any improvement in its functioning automatically improves the user experience and can boost our productivity big time.

The topic of scrolling seems to fascinate a surprising amount of people — computer forums are filled with questions like “Why does Mac OS X trackpad moves so much better than Windows?” or “Why no smooth scrolling on Linux?” . Despite the pry, the answers usually come down to something like “Mac OS uses magic, just deal with it”, which only makes the issue even more mysterious. However, as the third Clarke’s law says “Any sufficiently advanced technology is indistinguishable from magic” — so no actual magic is required — the proper technology is all that is needed.

Although the article contains a great deal of technical details, I hope that it will be interesting not only to computer programmers, but also to people who wonder how scrolling works under the hood, why we have what we have and how we can make the scrolling better.

Contents

All the code examples are available as a GitHub repository.

1. Introduction

Since “smooth scrolling” became popular, the term has been used for many different things — particularly, notion of smoothness and notion of precision got mixed. We should clearly distinguish between different characteristics of scrolling.

1.1 Smooth scrolling

Smooth scrolling is a scrolling that is visually continuous, i.e. not discrete.

What is smooth scrolling good for — isn’t it just “bells and whistles”? Nothing of the kind! Human visual system is adapted to deal with real-world objects, which are moving continuously. Abrupt image transitions place a burden on the visual system, and increase user’s cognitive load (because it takes additional time and effort to consciously match the before and after pictures).

A number of studies have demonstrated measurable benefits from smooth scrolling, for example, here’s a quote from Klein, 2005 PDF:

This study shows that animated scrolling significantly improves both efficiency and user satisfaction. The magnitude of this improvement is greatest for repetitive documents lacking visual landmarks.

To implement continuous scrolling we need continuous animation, however we don’t necessarily need high-precision input for that — for example, it’s possible to scroll content from end to end smoothly after only a single input event (like a keystroke), so that, while positioning is extremely non-precise, the transition is perfectly smooth.

1.2 High-precision scrolling

High-precision scrolling is a scrolling that allows fine-grained positioning with steps that are smaller than 1 line (but not necessarily as small as 1 pixel).

As a side effect, applications that don’t animate scrolling may produce smoother result when provided with high-precision events. However, high-precision events cannot guarantee smooth scrolling by themselves, because input resolution is often inadequate for visually smooth transitions.

Here’s what high-precision scrolling really offers (comparing to coarse-grained scrolling):

  • Accurate positioning. Even though large scrolling steps can be animated, there’s no way to fixate the content between the steps. High-precision scrolling makes it possible to position the content exactly as the user wants.
  • Better reproduction of dynamics. Because high-precision events have higher resolution, they can reproduce fine-grained dynamics of the original motion much better. This improves hand-eye coordination and makes it easier for the user to control the scrolling and to track the content.
  • Lower input latency. High-precision events increase the responsiveness of scrolling, because OS can dispatch those event to application immediately, without accumulating multiple events to get sufficiently large delta.

A particular kind of high-precision scrolling, known as “pixel-perfect scrolling”, guarantees minimal step to be 1 pixel. That kind of scrolling usually requires pixel-precise positioning and OS-level input acceleration — to offer both fine-grained- and long-distance scrolling at the same time. Pixel-perfect scrolling also guarantees the lowest possible input latency.

1.3 Summary

Because scrolling precision and scrolling smoothness are semi-independent, in practice, scrolling can be:

  1. Non-smooth, non-precise.
  2. Smooth, but not precise.
  3. Precise, but not truly smooth.
  4. Precise and smooth.

Apparently, we want scrolling to be both smooth and precise (ideally, pixel-precise).

To get the idea of scrolling smoothnes, scrolling precision and scrolling latency, you may check the following video (it’s better to open it in a separate windows):

2. Model

Any application that uses scrolling has an internal model that represents its scrolling state. This model typically contains the following properties:

  • minimum & maximum positions,
  • current position,
  • line step (“unit increment”) & page step (“block increment”),
  • visible range,

When application employs default OS / toolkit scrolling widgets, those values are stored in scrollbars, particularly:

If application doesn’t rely on the standard scrolling capabilities (e.g. most web browsers), it often still uses scrollbar components as its scrolling model (however, in principle, programmers may choose to implement a custom scrolling model that is independent of scrollbars).

What’s the deal with scrolling model? Scrolling model either permits or prohibits precise content positioning and smooth scrolling at the most basic level, depending on whether it uses lines as units of measurement.

One way to model scrolling state is to measure all the properties in “lines”. This method is a legacy from the era of text mode when content were represented as characters rather than individual pixels. In text mode, screen consists of uniform character cells and line-based scrolling model goes well with that. Although modern GUI applications are not bounded by character gird anymore, many programs still measure scrolling in lines, rather than in pixels. This is most typical of text editors (for example, Notepad, GVim) and terminal emulators, where text lines are apparent, but can also be used in other kinds of programs (e.g. file managers, etc).

It’s easy to determine whether particular program relies on a line-based model — simply use a mouse pointer to drag scrollbar’s thumb continuously and check whether the content is scrolled discretely (line-by-line). For example, Ubuntu‘s terminal emulator uses a line-based model (and, surprisingly, the same goes for the Mac OS Terminal):

Ubuntu terminal

Needless to say that when line-based scrolling model is in use, there is no way to scroll the content smoothly, in principle (because it’s impossible to position content with precision enough for smooth animation). No amount of “quality touchpads” can change that.

In addition to the coarse-grained scrolling, line-based model has a few more quirks:

  • It produces different “visual” speeds for different line heights — the same input event results in different pixel shifts depending on the document font size — the larger the font, the faster (and the less smooth) is its scrolling animation (thus, scrolling speed also varies between different applications).
  • If some of the lines have different heights (e.g. due to different fonts), scrolling becomes non-linear — even when input events are monotonous, visual shifts are not. This can also happen when document font is fixed, but a line with wraps is still considered “a single line” (this is, for example, what GVim does, and that feels rather unnatural).

What kind of model do we need to permit smooth scrolling? — we need to store scrolling state with pixel-level precision. If scrolling model uses integer variables, this requires the minimum / maximum variables to match real content size, in pixels. With floating-point variables we may use a constant range from 0.0 to 1.0, however, this approach might introduce rounding errors. It seems that the ideal scrolling model should match actual content size, in pixels, but use floating point numbers for further extensibility, because what is considered “real” pixels now can become device-independent pixels later.

When scrolling model is pixel-accurate, both precise- and smooth scrolling are in principle possible. However, proper model by itself is not enough — resulting precision still depends on the precision of input events and visual smoothness depends on the rendering algorithm.

3. Input

Suppose we have the right model, so what? Now we need to get the input from user and to adjust our model accordingly. The characteristics of these updates vary widely, depending on a particular device and on a specific OS API that we use to acquire the input. Let’s examine the most typical use cases.

3.1 Scrollbar

Using a mouse (or a touchpad) to drag a “thumb” (also known as “bar” or “knob”) on a scrollbar is the most apparent, easily discoverable way to do the scrolling.

Scrollbar

This interaction seems inherently continuous, but can we really use thumb location to control scrolling position directly? Let’s find out.

It should be noted that scrollbar also allows scrolling by line (by clicking the arrows) and scrolling by page (by clicking the track). Those actions are fundamentally different and we will discuss that type of scrolling later, together with the mouse wheel scrolling.

Keep in mind that scrollbar is not only a means of scrolling as such, but it’s also an essential visual indicator of the scrolling position, regardless of particular scrolling method.

3.1.1 Pointer precision

As we employ a pointer to drag the thumb, we first need to determine how smooth the pointer movement by itself is. For simplicity, I assume that we use a computer mouse to position the pointer, but all the reasoning is applicable to a touchpad as well (especially considering that, both in hardware and in software, touchpad is usually represented as a mouse).

While most of the conclusions can be drawn theoretically, some real-world numbers are always nice to have. To collect quantitative data, I used my current mouse — Logitech M-UAE55, which is a rather typical office mouse:

Logitech M-UAE55

To intercept USB HID reports, I used Arduino Leonardo with USB Host Shield as a “mouse proxy” (code):

Arduino Leonardo with USB Host Shield

Although this device is not a passive bus analyzer, because the ports support USB 2.0 and 1000 Hz polling rate, its precision is more than enough for our purposes. The advantage is that we can adjust mouse polling rate regardless of operating system.

Simultaneously, I used an application that tracks movement of the pointer (code). In this way, it’s possible to analyze both the mouse reports and the corresponding pointer movements in real time.

For those who have no Arduino at hand but wants to repeat the experiments, I’ve also created a software-only mouse analyzer (code), though it works only on Windows and it cannot tweak the USB polling rate.

3.1.2 Spatial resolution

For a start, let’s estimate spatial resolution of a mouse pointer (or, simply speaking, pointer’s step length).

Although most operating systems provide methods to access low-level mouse input (for example, WM_INPUT in Windows, XI raw events in Linux, I/O Kit in Mac OS), typical desktop applications never access that data directly. Instead, OS transforms and aggregates the incoming data and then exposes the result to applications as “pointer motion”. In some systems, the distinction is somehow blurred — for example, Windows has WH_MOUSEMOVE message that is “posted to a window when the cursor moves”. In Linux, the API is more explicit — Xlib refers specifically to pointer motion: XPointerMotionMask, PointerMovedEvent, etc.

Because, at the level of operating system, pointer coordinates are usually measured in pixels, “1 pixel” is the best precision / granularity we can attain. On the one hand, this statement is obvious, yet, on the other hand, it’s rather peculiar, because precision of the input device (mouse) is limited by precision of the output device (display). Thus, high-resolution display offers not only a more crisper graphics, but also a potentially more fine-grained pointer control (OS X, however, aligns pointer to device independent pixels, not to physical pixels, so Retina Display doesn’t improve the pointer accuracy).

Do we need single-pixel precision? As angular resolution of human eye is about 0.02°, people can differentiate pixels with density up to 300 PPI (from a typical screen viewing distance). Commonly used monitors have pixel density way below this number (for example, my Iiyama XB22783HSU has only 82 PPI). Thus, single-pixel precision is desirable (and, actually, insufficient, as we’ll see later).

So, can we use a mouse to move the pointer pixel-by-pixel?

3.1.3 Sensor resolution

Although the pointer is aligned to the pixel grid, mouse, by itself, doesn’t deal with the notion of pixels — it reports relative movement in abstract “counts” (or “steps”) that are tied neither to pixels, nor to a predetermined physical distance.

The more we move the mouse, the more counts it reports. The exact number of counts depends on mouse’s sensor resolution, which is measured in terms of counts per inch (CPI) — the number of counts the mouse will report when it moves one inch (this number is often incorrectly listed as “dots per inch” or DPI in mouse specification).

By comparing the travel distance with the number of reported counts we can determine resolution of particular sensor experimentally. For example, my mouse reports 7880 counts after traveling 400 mm, which precisely corresponds to 500 CPI.

How to map mouse counts to screen pixels? The simplest way is to shift the pointer by one pixel per reported count. That happens on “medium” pointer speed in most OSes (unless it applies “acceleration”):

Mouse pointer speed

However, depending on the exact sensor resolution, screen resolution and display size, the direct 1-to-1 mapping might result it either excessively slow or excessively fast pointer response that can make the control uncomfortable. To counter that, operating systems offer to adjust pointer speed, making the cursor move faster or slower. Technically, this adjustment is implemented as a multiplier coefficient that determines average pointer step per mouse count, so the step can be arbitrarily bigger than 1 pixel.

For example, to control my mouse comfortably, I need to shift the pointer speed slider one notch to the right, so that my mouse travels about 65 mm while the pointer crosses my Full HD screen horizontally (1920 pixels). This corresponds to 1.5 pixel step per mouse count (and on a 4K screen that would be twice as much). By tracking the pointer movement, we can confirm that the average step is indeed 1.5 pixels (which is larger than 1).

As there’s no way to position the pointer within a fraction of a pixel, operating system has to resort to non-uniform pointer steps in spite of uniform input from the mouse (because of quantization. Thus, insufficient sensor resolution results not only in nonoptimal step length, but also in nonuniform pointer motion.

For instance, when my mouse generates (1, 1, 1, 1, 1, ...) movement reports, the pointer is shifted by (1, 2, 1, 2, 1, ...) pixels respectively, so that the average step approaches 1.5 pixels, yet the movement is not uniform . While 1-pixel oscillation might not seem like a big deal, we need to take subsequent “magnification” by the scrollbar into account (as we will see later).

Although the coarse-grained / uneven pointer motion stems from inadequate sensor resolution, operating system may try to mend the problem with software means. For example, Windows provides an option called “Enhance pointer precision”, which reduces mouse sensitivity to 1.0 when mouse is moved slowly:

Enhance pointer precision

In a sense, this mode is “on-demand deceleration”, and while it helps to increase the precision (but only down to 1-pixel level), it gives rise to nonlinear pointer response that complicates hand-mouse-eye coordination (more info). The effectiveness of such a mode is limited to relatively slow movements (because of how USB polling works, as we will see next). Besides, there’s no guarantee that pointer acceleration is supported / enabled in a particular OS.

Only a mouse with sufficient sensor resolution can reliably provide stable 1-pixel pointer step (but we cannot guarantee that user has such a mouse). If you do have a mouse with adjustable sensitivity, you should never use sensitivity as a means to control pointer speed (despite “common knowledge”). Instead, it’s reasonable to select the highest CPI (DPI) possible and then rely on OS’ mouse settings to set desirable speed, while keeping the precision. If you have multiple mouses or touchpads with different resolutions, you can use something like EitherMouse to control the speed / acceleration on per-device basis.

3.1.4 Polling interval

When mouse registers a “count” it doesn’t report it via USB immediately. Instead, it waits for a USB host to request the transmission. These request are performed with a fixed interval, known as “polling interval”. Though USB devices request desired polling interval on initialization (down to 1 ms), operating systems usually force 10 ms polling interval for low-speed / full-speed (USB 1.x) devices, which is subsequently rounded to 8 ms (nearest power of 2) by USB controller. As a result, a typical mouse can send no more than 125 reports per second (125 Hz polling rate). In Windows, you can use Mouse Rate Checker to determine the polling frequency experimentally (there’s also a web-based alternative):

Mouse Rate Checker

As soon as frequency of generated reports exceeds the polling rate, there’s no way to receive those reports one by one. It’s easy to calculate that, for my 500 CPI mouse, this limit is exceeded when mouse speed is above ~6 mm / s — a surprisingly low threshold. Because report generation and report transmission is not synchronized, the effect can occasionally appear on much lower mouse speeds, when the transmission channel is not fully saturated.

What happens when mouse accumulates multiple reports before the transmission? Those reports are merged in a single HID report inside the mouse, so there’s no way to determine whether there were multiple small steps or there was a single large one.

Insufficient polling rate can increase pointer step even more than insufficient sensor resolution. For example, on moderate-speed movements, my mouse emits 5-7-count reports at 125 Hz, while reporting each count separately at 1000 Hz. That is a 5-7x step increase, yet there’s only 1.5x step increase owing to the low-resolution sensor (we need to multiply the two values to get the net result).

Because report aggregation happens sporadically, it also disrupts step uniformity (much more than the quantization error).

Thus, high sensor resolution per se is not enough — mouse must enforce an appropriate polling rate to preserve 1-pixel pointer step (and we cannot expect this from a usual mouse). If you have a mouse a with high-resolution sensor, make sure that its polling rate is set the maximum possible value (usually, 1000 Hz), otherwise the effective resolution might be hindered (moreover, even a typical mouse can benefit from the increased polling rate — my mouse can easily generate up to 1000 reports per second, given the opportunity).

3.1.5 Temporal resolution

While physical mouse movement is continuous, pointer position is updated discretely. Temporal resolution refers to how frequently we can update the position of the mouse pointer. We need both spatial- and temporal resolution to be sufficient to consider the input “smooth”.

What temporal resolution is enough? We should aim for at least the typical display refresh rate (60 Hz). However, as the input events and the output frames are not synchronized, it’s better to aim for at least twice as much. Another consideration is monitors that support higher refresh rates (up to 240 Hz so far). Can we really achieve that? Yes and no.

The frequency of the updates is influenced by multiple factors:

  • physical mouse speed — the faster we move the mouse, the more frequently it generates reports,
  • sensor resolution — the higher the resolution, the more updates is generated for the same physical movement,
  • polling rate — apparently, the frequency of updates cannot exceed the mouse polling rate,
  • pixel density — the smaller the pixels, the more fine-grained updates can be reported by the operating system.

Depending on the particular combination of those factors, temporal resolution might be either extremely high, or extremely low.

First, a low-resolution mouse generates only about 1-3 counts per second during really slow movements. But even if we use a mouse with high-resolution sensor and 1000 Hz polling rate, pixel density will limit the rate of pointer position updates to almost the same number — there is simply not enough pixels along the pointer path to generate more frequent updates when pointer speed is slow. So, display density affects not only mouse’s spatial resolution, but temporal resolution as well.

Alternatively, when we move the mouse fast, we can easily generate too many updates, especially when the polling rate is high and the pixel size is small (up to 1000 updates per second). In such a case, we can easily overwhelm our rendering algorithm if we choose to update image on every input event.

The only way to substantially improve the temporal resolution is to have both a high-res monitor and a high-precision mouse, but even then the update frequency will be variable and often non-optimal.

Those who wants to dig deeper may also read about polling precision and pointer microstutters.

3.1.6 Software processing

In addition to the hardware report aggregation, operating systems and widget toolkits might do event aggregation of their own.

For example, Xlib Programming Manual explicitly states that

The granularity of MotionNotify events is not guaranteed, but a client that selects this event type is guaranteed to receive at least one event when the pointer moves and then rests.

— that sounds hardly reassuring.

Likewise, Java, for example, has Component.coalesceEvents method for aggregating input events. The description states that

For mouse move events the last event is always returned, causing intermediate moves to be discarded.

However, this functionality seems to be disabled by default, which is a good thing.

Unlike Java, GTK 3 enables “event compression” by default, which not only merges several motion events together, but also delays the delivery of events (up to the next frame rendering) and skews the timing. This might pose a problem for applications that require more fine-grained input.

3.1.7 Mapping precision

To perform scrolling, we use the pointer to drag a “thumb” on a scrollbar. The thumb itself is positioned on the screen with 1-pixel precision, and that position determines the resulting scrolling position.

Because scrolling, by definition, implies that content size can be greater than view size, content length can be arbitrary greater than length of the corresponding scrollbar’s track. This mismatch results in suboptimal positioning precision. For example, if content height is, say, 10000 pixels and view height is 1000 pixel, a single track’s pixel maps to 10 content pixels and that can be a text line height (strictly speaking, we need to account for the thumb length and the length of possible arrow buttons, but the simplified calculation captures the essence well enough):

Scrollbar mapping

Thus, 1-pixel pointer steps can result in 10-pixel content steps, and 1-2-pixel inaccuracies can turn into 1-2-line inaccuracies (which is more than noticeable with the naked eye). The bigger the content and the smaller the view, the lower the positioning precision. In extreme cases, when 1-pixel “thumb step” maps to a “content step” that is greater than the corresponding view size, supposedly continuous scrolling will even skip portions of the content. Apparently, 1-pixel pointer precision is not enough to control the scrolling position directly.

To improve the positioning precision, we need subpixel pointer precision (you may see a good video on that topic), however, such a solution is yet to be implemented in reality. The closest thing today is probably libinput (supports subpixel movements) + GTK (uses floating-point numbers for almost everything) combination, but it’s still not there, because some internal data is still represented as integer numbers and subpixel accuracy is lost along the way (XInput 2 also reports subpixel coordinates, but only for touchpads; Qt is also moving in the right direction). Keep in mind though, that subpixel positioning, by itself, requires actual high-resolution input devices to be effective.

Ideally, we should not rely on physical resolution of a particular input device — operating system should decelerate pointer to the required degree of subpixel precision (which depends on the view size / content size ratio) dynamically, on demand — when user drags the thumb and the pointer is moved slowly. This could be a real solution to the inherent scrollbar imperfection, yet that solution requires a new kind of API which is still in the lap of the future. Currently we have what we have and we must deal with it.

Although, at the application level, we cannot increase the positioning precision, we can artificially level out the position transitions, so that the scrolling will be much more smooth, albeit not more precise.

3.1.8 Summary

Let’s summarize the key findings:

  • pointer step is often suboptimal,
  • pointer movement might be nonuniform,
  • pointer update frequency is not guaranteed,
  • event rate might interfere with the frame rate,
  • 1-pixel pointer precision is still not enough.

So, it’s hardly a good idea to rely on pointer position directly to control scrolling. Sadly, this is what most applications and widget toolkits still do, even those that implement other kinds of “smooth scrolling”. For example, that’s why Safari‘s, touchpad scrolling is much more smooth than scrollbar scrolling (and that is, actually, unjustifiable). Using a high-resolution mouse can significantly improve the smoothness (e.g. when I use Zowie FK1 at 3200 CPI / 1000 Hz the difference is clearly noticeable). However, some limitations are inherent and cannot be solved by a better mouse.

Instead, we need to filter / interpolate the input data and render the image asynchronously if we want scrolling to be truly smooth. As pointer steps are “magnified” by the scrollbar, we need to apply the transformations at the level of scrolling position, not at the level of pointer position.

On relatively fast movement, when temporal resolution is sufficient, dynamics of mouse motion (produced by inertia of hand / mouse) is mirrored by the pointer movement naturally, while on slow movement we may choose to go beyond linear interpolation and to emulate dynamics via easing curves.

Because filtering and interpolation introduce an additional delay in the input processing, we need to make a reasonable tradeoff between the smoothness and the input lag, as we also want to keep the scrolling latency to a minimum.

An interesting side note: 3D games that popularized free look (“mouselook”) faced the same problem of insufficient mouse resolution / interference and had to add mouse interpolation — in case of Quake series, we can see this at first hand in John Carmack‘s “.plan files”.

3.2 Mouse wheel

Mouse wheel (or “scroll wheel”) is a ubiquitous means of scrolling on desktop computers. A definite advantage of wheel scrolling is that it allows the mouse pointer to remain close to the content rather than moving to a scrollbar. Apparently, mouse wheel can rotate only vertically, so many applications rely on Shift key as a modifier to scroll horizontally.

Mouse wheel

Typical mouse wheel input is highly discrete, because at the level of hardware, mouse uses optical or mechanical switches to detect rotations that correspond to definite tactile wheel “steps” (many mouses that have a “free spinning” wheel is no better — despite the absence of notches, registered events are still coarse-grained). Thus, no matter how slowly you rotate the wheel, mouse reports only a single discrete event for each wheel step. Although there exist mouses with “high-resolution scroll wheel” that can detect smaller angular steps, those mouses are still quite rare.

How much to scroll on each wheel step? Discrete wheel steps naturally correspond to discrete text lines — when scrolling model is line-based, mouse wheel steps over a number of lines, when pixel-precise model is in use, mouse wheel increment / decrement equals to exact pixel height of those lines.

Because direct “1 step = 1 line” correspondence results in a rather slow scrolling, operating systems employ different strategies to increase wheel scrolling speed. In Windows and Linux, mouse settings allows us to select a fixed number of lines (or even “one screen“) that should be scrolled on each wheel step:

Mouse Wheel Properties

The default value is 3, so mouse wheel scrolling is even more coarse-grained. Note, that this setting affects both scrolling speed and scrolling precision simultaneously and we can trade one for another. I have often heard a recommendation to set the number of scrolled lines to 1 to increase the scrolling precision (and thus, smoothnes in legacy applications), however, this tip is hardly practical because it makes scrolling unbearably slow.

Mac OS approaches wheel scrolling differently — it applies “scrolling acceleration”, so that, initially, mouse wheel scrolls content by a single line (or a line part), but when the frequency of steps increases (i.e. when wheel is rotated faster), wheel step increases dynamically and can be arbitrarily large. In a sense, this is a better solution to the tradeoff between scrolling precision and scrolling speed, however it poses a different challenge, as with the non-linear response it’s harder pre-estimate scrolling distance.

Because wheel step is tied to particular lines, mouse wheel scrolling exhibits all the quirks of line-based scrolling model even when the model is pixel-precise, namely:

  • visual scrolling speed depends on document line height,
  • visual speed varies between different applications,
  • the speed is non-linear when line height is variable.

Moreover, with pixel-precise scrolling model, wheel scrolling has one more quirk — when current scrolling position is “inside a line” (for example, as a result of scrollbar usage), a subsequent wheel step might shift the content only to the next line boundary, which can be as close as 1 pixel, and such a step feels as “partially skipped”.

As the discreteness of typical scroll wheel maps well to the discreteness of lines, all the quirks are somehow “natural”. However, for a mouse with a “free-floating” scrolling wheel (or, especially, with a high-resolution scrolling wheel), all those peculiarities are indeed “quirks”. For step-free input, visual speed that is uniform and consistent across different applications is a much more reasonable solution. However, such a control requires a different way to represent scrolling distance and not all OSes support that yet (as we’ll see later when discussing touchpads).

Both spatial- and temporal resolutions of typical mouse wheel input is insufficient for smooth scrolling if we use that input to set scrolling position directly. However, that’s exactly what most applications still do, and that’s why scrolling in those application is not smooth at all. To make wheel scrolling smooth we need to explicitly animate the steps. Because it’s relatively easy to add such an animation separately, there exists many browser- and IDE plugins that try to mend the neglect and promise “smooth scrolling”, but their effect is usually limited only to the mouse wheel scrolling, and, in isolation from corresponding rendering improvements, their effectiveness is often subpar comparing to possible built-in implementations. Fortunately, quite a few major applications (e.g. web browsers) recently implemented the wheel scrolling animation natively.

From the viewpoint of user, step animation is supposed to mirror dynamics of the physical wheel rotation, however, all we have is a single discrete event that lacks any information about the original motion. The input event can be generated at different phases of the physical movement, depending on the particular mouse model — some mouses generate an event at the beginning of wheel rotation (before the “click”), some — in the middle (together with the “click”), and others — in the end of the rotation (after the “click”). When the event is postponed, it feels like an “input lag” by the user, a there’s not much we can do about it.

An important consideration is a duration of the first animation step. We should aim for duration of the original physical movement, however, this duration varies depending on the actual wheel rotation speed, which, for the first step, is unknown. It seems, that a period about 140 ms feels most natural. If we increase this duration, the scrolling might feel “laggy”, especially considering that, at the moment of animation start, some part of the physical rotation is already performed. Likewise, we shouldn’t make animation of the initial step too short, because if we do, it will be stopped before a potential subsequent wheel event can be received, and that disrupts scrolling continuity.

Duration of subsequent wheel steps can be predicted with better accuracy because we can rely on time elapsed between the adjacent events to approximate wheel rotation speed. Unlike a single wheel step, a series of wheel steps reproduce general dynamics of the motion good enough. In that case, animation can be considered interpolation.

Physical motion of the mouse wheel is not uniform, it has a period of acceleration in the beginning and a period of deceleration in the end (which, again, depend on particular mouse construction). We may simulate that by applying “ease-in” and “ease-out” curves (also see an interactive tutorial). Some toolkits provide helpers for doing that, e.g. Qt’s QEasingCurve or Java FX’s Interpolator).

Here’s an example of the cubic ease in / out curve:

Ease in out curve

When animation accurately reflects the dynamics of the physical motion, resulting scrolling feels very natural and user might even think that the system indeed tracks fine-grained wheel movements. While, on the whole, this is a desirable outcome, some users might be baffled by the lack of positioning precision (e.g. 3 lines) alongside with the seemingly precise control.

High-resolution mouse wheel is a special case, which requires a different API. We’ll discuss that later, together with touchpads. Additionally, because Mac OS applies “acceleration” to mouse wheel input and can report “partial rotation” (starting from 1 / 10) on a complete wheel step, this should also be regarded as a particular case of high-resolution wheel (though, I must say, such an approach works decently only with Apple mouses, for example, with MightyMouse — because it has a trackball instead of a mouse wheel; MagicMouse goes even further — it uses a touchpad).

There are other triggers of scrolling that are very similar to the mouse wheel in their “discreteness”:

The same recommendations are applicable to these cases (animate the transitions, apply ease-in / ease-out curves, etc). It’s reasonable to tune the animation duration depending on a particular scrolling distance.

Interestingly, there exists many JavaScript libraries that add smooth transition to in-page navigation, instead of instant jumps, which obscure the relationship between different parts of the content (however, major web browsers nowadays support that feature out-of-the-box).

3.4 Touchpad

Unlike the scroll wheel, touchpad (or “trackpad”, as Apple calls it) has an inherent ability to generate precise input to control the scrolling (by using “edge scrolling”, “two-finger scrolling” or “circular scrolling”). The only reason why it has not been the case for so long is because, since the inception, touchpad has been pretending to be a mouse in all respects.

Touchpad

Early touchpads were functionally equivalent to a computer mouse — a sensor to control the pointer and a few buttons — no multi-touch, no force touch, no gestures. A temptation to use existing mouse protocols and APIs was too great, so we end up with touchpads that have a virtual scroll wheel (with “precision” of the mouse scroll wheel — 3 lines, by default). In addition to the loss of precision, such an approach conceals the fine-grained dynamics and increases the latency — if you move fingers slightly, yet not enough to trigger the many-lines scrolling, no visible feedback is produced.

After it became apparent that touchpad is quite a different beast, it was too late — the existing scheme worked decently, while introducing a new kind of input required upgrades at all levels, namely:

  1. hardware protocols,
  2. device drivers,
  3. OS APIs,
  4. widget toolkits,
  5. applications.

If any element of that chain is missing, the end result is as good as none. So there were not enough incentives to revise any of the elements separately. Only Apple, being a company that controls all parts of the integration, managed to overcome inertia and produce a decent solution early on.

Now we’re finally approaching a point where PC touchpads can (sometimes) be used for high-precision scrolling, but we’re not fully there yet. Let’s put all the pieces of this puzzle together.

3.4.1 Hardware protocol

Although PS/2 port is now considered a legacy, most laptop touchpads are connected via PS/2 under the hood. Some touchpads (notably, Apple’s ones) use USB for connection. There also exists I2C, SPI and SMBus touchpads, but those are in minority (though SPI is used in 2015-2016 MacBook Pro models).

After initialization, touchpads rely on PS/2 or USB HID protocols to function as a basic computer mouse. This is convenient, because in the mouse mode (known as “relative mode”) touchpad doesn’t require a special driver and can work anywhere (for example, in UEFI). However, in this mode touchpad usually doesn’t generate scrolling events at all — try to use MacBook’s touchpad in Windows without Bootcamp to get the idea.

Receiving low-level data about finger positions, contact areas, pressure, etc. requires sending a special sequence after which touchpad switches to extended version of communication protocol. Because there was no such thing as “standard touchpad protocol”, each vendor introduced its own proprietary format (for example, see Synaptics Touchpad Interfacing Guide PDF). All proprietary protocols require special drivers to communicate with touchpad and interpret the data (so it’s driver that actually handles the scrolling).

In Windows 8.1 Microsoft introduced HID-based Precision Touchpad which aims to be a standard touchpad protocol that doesn’t require a special driver to access the low-level data. However, such touchpads are still uncommon.

From hardware standpoint, almost any touchpad is capable of producing (relatively) fine-grained input to control the scrolling. Unless touchpad emulates scrolling events in firmware (which some touchpads might rarely do), scrolling precision is largely dependent on the touchpad driver.

3.4.2 Windows

Most OEM Windows installations bundle touchpad drivers, so one rarely needs to download and install them manually. Additionally, Microsoft requires that touchpad drivers should be available via Windows Update. Although it’s easy to obtain a driver, there’s no way to tweak or modify it, as Windows touchpad drivers are usually proprietary software that implement vendor-specific protocols (e.g. if Apple’s BootCamp driver doesn’t support high-precision scrolling, we cannot do much about it).

Suppose touchpad driver receives the low-level data and recognizes a scrolling gesture, what should it do? In Windows Touchpad Gesture Implementation Guide
Microsoft states that it should “issue horizontal or vertical wheel inputs, depending on the direction of travel of the fingers”. So, the driver is supposed to emulate a mouse, this time at the software level (because it’s the only API that is supported by most applications).

The emulation is performed by input injection, where the driver uses SendInput function to insert mouse events to the OS event queue. Particularly, the scroll wheel is emulated by sending WM_MOUSEWHEEL messages (more info). For horizontal scrolling, driver either emulates a simultaneous Shift keypress, or sends WM_HMOUSEWHEEL (which is less supported by applications).

At this point it might seem that in this model we’re bound to the ratcheting 3-line positioning, but, fortunately, that is not the case. Starting with Windows Vista, Microsoft introduced an enhanced wheel support in windows which offers high-precision WM_MOUSEWHEEL messages. While I’m yet to see a mouse with such a wheel, as an API, this is a handy abstraction.

In theory, touchpad driver can emit the high-resolution scrolling events to produce much more precise input and Microsoft recommends to do exactly that (by the way, notice, the self-contradictory requirement for the temporal resolution). However, in practice, few drivers choose to proceed along this path. Why so? Because many Windows applications were written with the legacy mouse wheel API in mind and thus cannot correctly handle high-resolution messages — the end result might vary from no scrolling to extremely fast scrolling (by the way, you can use the provided HiResScrollWheelEmulator to test the processing of high-resolution messages). Another potential problem is performance — when the resolution of input events is increased by an order of magnitude, applications that render the events synchronously might be overwhelmed and the scrolling might lag. Most manufacturers chose a bird in the hand over two in the bush and emulate the “good old” low-precision scrolling events. While this way of thinking proved its value in the past, nowadays it’s unnecessarily restrictive.

To add compatibility with the high-precision wheel events Windows application needs to:

  1. take delta value received with the WM_MOUSEWHEEL message into account,
  2. accumulate the delta until the minimum amount needed to scroll is reached (at least, 1 pixel),
  3. reset the accumulated delta when the direction of scrolling is reversed.

To get an idea of what it takes to support high-resolution wheel scrolling events you may take a look at Firefox as an example. As an alternative way to “fix” the compatibility issues, since Windows 8.1 Microsoft provides a method to opt-out from the high-precision scrolling messages via application manifest.

The distance the wheel is rotated is expressed in multiples or fractions of WHEEL_DELTA, which is chosen to be 120 (interestingly, 120 is a smooth number). Because scrolling distance is still measured in wheel rotations, albeit “high-resolution” ones, such a scrolling exhibits all the quirks of line-based scrolling, i.e.:

  • visual scrolling speed depends on document line height,
  • visual speed varies between different applications,
  • the speed is non-linear when line height is variable.

Unlike the mouse wheel, touchpad lacks any perceptible notches, so the underlying rotation / line mapping is completely unobvious to the user (indeed, what is a “rotation” in reference to touchpad, let alone “partial rotation”?). From touchpad-like devices people naturally expect uniform and consistent visual scrolling speed that doesn’t depend on the content data. Besides, the relative precision of 1/120 “rotation” is not too great — if line height is more than 40 pixels, the minimal scrolling step becomes larger than 1 pixel. And, on the contrary, if line height is low, initial scrolling events will be skipped, which results in excessive scrolling latency. So, line height also affects the precision and the latency of scrolling and prevents pixel-precise positioning. All in all, “wheel rotation” is imperfect and leaky abstraction for the touchpad scrolling events.

Because of the limitations of “high-resolution wheel events”, Precision Touchpad integrates with Direct Manipulation API which supports “absolute” transformations that are independent of line / row size, as well as asynchronous rendering (that API was originally introduced in Windows 8 for touch-based scrolling). However, any application that seeks to be cross-platform or needs to function in Windows 7 can hardly depend on such an all-encompassing API. Fortunately, it’s possible to register a window as a Direct Manipulation consumer and receive the pixel-precise input events without involving the rendering parts of the API (more info).

The unified, OS-level Precision Touchpad driver tries hard to handle the scrolling events gracefully:

  1. first, the driver tries to use Direct Manipulation API for “pixel-perfect” scrolling,
  2. then, if application doesn’t support the API (very few do), the driver tries to find and manipulate application scrollbar(s) directly (by sending WM_HSCROLL and WM_VSCROLL messages, like with legacy touchscreen panning), because, unlike the mouse wheel events, scrollbars can support pixel-precise deltas (this step might not work for widget toolkits that don’t use native scrollbars),
  3. finally, if there are neither the Direct Manipulation API, nor a suitable scrollbar available, the driver resorts to sending the high-resolution WM_MOUSEWHEEL messages.

Keep in mind though, that all the goodies of unification require a compatible touchpad in the first place, as the universal driver works only for touchpads that support “Precision Touchpad” protocol. Moreover, because of the lacking application-level support, this presumably unified scrolling can still be very inconsistent in practice.

So, in principle, Windows has everything to support high-precision touchpad scrolling (even without the Precision Touchpad API), yet not all applications handle it correctly and most touchpad drivers choose to play safe. The Precision Touchpad improves things by shifting the gesture processing to the OS itself and by using Direct Manipulation API for pixel-precise scrolling (however, that requires both compatible hardware and proper application support).

3.4.3 Linux

Unlike Windows, Linux has to reverse engineer touchpad protocols. At first, this was a disadvantage, but later on this approach started to bear its fruits — modern Linux kernel supports most touchpads out of the box, and everyone is free to tweak or modify any aspect of this support to one’s heart content (interestingly, MacBook’s touchpads are fully supported in Linux while the official Windows support leaves much to be desired).

Currently touchpad handling in Linux is split between the kernel and X Windows System. The kernel performs low-level communication with the hardware and then exposes device events via evdev‘s /dev/input/evenX character devices. We can use cat /proc/bus/input/devices to list mapping between hardware and event interfaces. For example, on HP 355 G2 I get the following:

I: Bus=0011 Vendor=0002 Product=0007 Version=01b1
N: Name="SynPS/2 Synaptics TouchPad"
P: Phys=isa0060/serio1/input0
S: Sysfs=/devices/platform/i8042/serio1/input/input6
H: Handlers=mouse0 event5

This testifies that kernel driver processes data from a PS/2 port and makes the result available via /dev/input/event5. In a sense, this abstraction accomplishes the same goal as the Precision Touchpad protocol in Windows, though in software, not in hardware.

Subsequent data processing is done by synaptics driver inside Xorg server and that’s where the scrolling gestures are handled. Despite its name (which is a legacy), the driver handles non-Synaptics touchpads as well. If we look through /usr/share/X11/xorg.conf.d/50-synaptics.conf we will see something like:

Section "InputClass"
    Driver "synaptics"
    MatchIsTouchpad "on"
    MatchDevicePath "/dev/input/event*"
EndSection

This means that this driver is used for all touchpads in Xorg, even for the MacBook ones (yet Wayland uses libinput as a touchpad driver). We can make sure of that by checking /var/log/Xorg.0.log:

config/udev:
Adding input device SynPS/2 Synaptics TouchPad (/dev/input/event5)
Using input driver 'synaptics' for 'SynPS/2 Synaptics TouchPad'

Now let’s pose the same question — suppose the driver receives the low-level data and recognizes a scrolling gesture, what should it do? In Linux, we don’t have to guess because we have the code: before version 1.7 the driver emulated mouse button clicks — button 4 / 5 for vertical scrolling and button 6 / 7 for horizontal one. Now you might ask “what those mouse buttons have to do with scrolling?”. The answer is that X Server API was even more deficient than Windows’ one — it had no dedicated scrolling wheel events and represented wheel rotation as clicks of special mouse buttons. This was great from compatibility standpoint, but for high-precision scrolling it was a showstopper.

The only way to overcome the limitation was to introduce a new API that supports more precise scrolling events. This API happened to be XInput 2.1 Xorg extension which introduced high-precision scrolling events. The API is available since Xorg 2.12, and since version 1.7 the touchpad driver uses the new API. By the way, another thing that was introduced in XInput 2 is separation between device coordinates and screen coordinates (i.e. it’s possible to acquire coordinates with subpixel precision).

If you have xinput package installed, you can easily test whether your touchpad generates high-resolution events in XInput — run xinput test-xi2 --root command and then perform two-finger scrolling, the output should contain something like:

EVENT type 6 (Motion)
  device: 7
  ...
  valuators:
    3: 142.00

The valuator value should reflect even slightest finger movements (you can use xinput list --long to lookup the device / valuator IDs).

While this is a definite improvement over the legacy API in terms of precision, the new API still retains the rotation / line bond: XIQueryDevice function reports “increment” value that “specifies the value change considered one unit of scrolling down”. Thus, the API is an equivalent of Windows’ high-resolution mouse wheel events, and it’s unsuitable for implementing visually consistent, pixel-precise scrolling (unlike XInput 2, libinput can report scrolling directly in pixels).

Because the API is brand new, there’s no way legacy applications can take advantage of it without implementing the support explicitly. Even though the extension exists for quite a while, many applications and widget toolkits are still lagging behind. Even Firefox still has this support disabled by default — try to run env MOZ_USE_XINPUT2=1 firefox to feel the apparent improvement in the scrolling precision. Chrome is barely catching up as well.

Widget toolkits, such as GTK+ or Qt, deserve special consideration, because applications that use their standard scrolling widgets (GtkScrolledWindows and QScrollArea respectively) may support high-precision scrolling out of the box. As of now, GTK+ 3 and Qt 5 support precision scrolling, yet GTK+ 2 and Qt 4 — don’t. Using GTK+ 3 or Qt 5 by itself doesn’t guarantee much, because applications might either rely on a non-standard scrolling implementation or use a line-based scrolling model (for example, the terminal window in Ubuntu 16.10 still lacks precision scrolling, the same goes for the terminal and the file browser in Lubuntu 16.10).

Thus, unlike in Windows, the main obstacle for high-precision scrolling in Linux is not touchpad drivers but applications themselves. Because touchpad input handling is unified, applications are mostly open source and their update is automated, the future of precision scrolling looks more bright on Linux than on Windows, especially for “legacy” touchpads (yet, there’s no X server API for pixel-precise scrolling so far).

3.4.4 Mac OS

OS X has followed almost the same road as the other operating systems. In Handling Trackpad Events guide Apple confesses that “technically, scroll gestures are not specific gestures but mouse events” with type of NSScrollWheel. In the same way, in OS X Lion they extended the NSEvent properties to supply high-precision deltas.

The fundamental difference is that Apple managed to provide all that as a complete package, so application developers could rest assured that all the parts work out of the box. Still, it’s not that unusual to stumble on an application that doesn’t support high-precision scrolling, even in Mac OS. For example, some code still rely on legacy deltaY and deltaX properties instead of the recommended scrollingDeltaY and scrollingDeltaX ones (notably, that’s what Java does). Another possible shortcoming is a use of line / row based scrolling model — for example, the built-in Terminal can only scroll by line, the same goes for LibreOffice Calc (please note that the discreteness of lines / rows is not a principal limitation — it is fully compatible with high-precision scrolling).

Early on, Apple recognized that, unlike mouse wheels, touchpads have no “notches”, “rotations” or “increments” that could correspond to lines, rows or other “scrolling units”. The high-precision scrolling is measured directly in pixels, not in fractions of “increment”. That’s why touchpad scrolling in Mac OS is uniform across different applications and doesn’t depend on the window content. The ability to control the position with pixel-level precision goes well with scrolling acceleration, when initial scrolling events are guaranteed to be 1 pixel shifts, but subsequent steps are increased to facilitate long-distance scrolling. The pixel-precise positioning together with system-wide scrolling acceleration (see below) produce that “Mac OS scrolling experience” people fond of (so, the essence is in software, not in hardware).

3.4.5 Summary

As we have seen, all the major operating systems can support high-precision scrolling, however, to actually make it happen, the support must be available on many different levels, and each OS has its typical pitfalls that break the chain.

For an OS in a virtual machine (e.g. VirtualBox) it’s even more complicated — both host OS and guest OS must propagate the high-precision events correctly and the virtual machine must be able to properly emulate them. We can greatly increase our chances by exposing touchpad to guest OS directly rather than emulating the device — for example, it’s possible to expose MacBook’s touchpad to a Linux guest inside a Windows host and to enjoy high-precision touchpad scrolling whereas host’s BootCamp driver doesn’t support it.

Java virtual machine also deserves a special consideration — although Java 7 introduced getPreciseWheelRotation method in MouseWheelEvent, proper native processing is available only on Windows (partially supported in Mac OS, not supported in Linux). Besides, JScrollPane doesn’t actually use this data and continues to rely on the old, coarse-grained getWheelRotation method. Moreover, Java API cannot represent “absolute” scrolling deltas, which are generated by Windows and Mac OS.

Despite that high-precision scrolling is often touted as a means of smooth scrolling, it’s only true for applications that use a naive approach to scroll event processing, and even then the “smoothness” is suboptimal and performance problems can be expected. As we have seen with the mouse pointer, both spatial- and temporal resolutions of input data are not enough to control the rendering directly and interpolation is the way to go. Typical touchpad precision can be even lower than mouse’s one — if we look at the synaptics driver code, we can find comments like:

The default values 1900, etc. come from the dawn of time, when men where men, or possibly apes. … We expect to be receiving a steady 80 packets/sec (which gives 40 reports/sec with more than one finger on the pad. … We use this to call back at a constant rate to at least produce the illusion of smooth motion.

— aren’t those those encouraging?

For a properly written scrolling implementation, high-precision scrolling events are mainly good for increasing the precision of positioning and for reducing the latency, yet these are still noble goals. Most modern browsers explicitly distinguish between “high-precision scrolling” and “smooth scrolling”, for example, as one Chrome developer wrote:

The switch is named high-precision rather than smooth to avoid confusion with the scroll interpolation feature known as smooth scrolling.

What if some touchpad can generate sufficiently high-resolution events — can we use those events directly, without interpolation? Unlike the case with mouse pointer, there’s no subsequent magnification of events by a scrollbar, so, in principle, if both spatial- and temporal resolutions are enough for a particular display, we may consider using the events without the prior interpolation. However, in practice, touchpad hardware is rarely capable of such a feat. Moreover, there’s no API to reliably detect these capabilities beforehand. That’s why, for instance, Chrome developers decided to always use the interpolation for touchpads on Windows and Linux.

Now you might ask “…but what about Mac, do they have superior hardware?” While Mac hardware is indeed very good, the underlying reason is different: because Mac OS applies system-wide scrolling acceleration, it can (partially) overcome the problem of insufficient temporal resolution on slow finger motion (as Microsoft stated “If the finger is not moving or is moving very slowly, the frequency can be lower than 30 Hz.”) — by generating small enough deltas to make up for the low event frequency.

That said, there are reasons to employ the interpolation regardless of the event resolution (to improve rendering, as we will see later).

3.5 Features

To acquire raw user input is only half the story — how we transform and interpret the input is equally important. There exists several important features that can significantly affect the resulting usability.

3.5.1 Pixel-precise scrolling

As we’ve already discussed this topic together with the touchpad APIs, here goes a brief recap that summarizes the major points. There exist several distinct ways to represent the distance to scroll (“scrolling delta”), each method has its advantages when combined with appropriate input devices, and no single method is absolutely superior.

The first method came from the era of text mode — to measure the scrolling distance in lines / rows or a fixed number of lines / rows (in some OSes it’s also possible to use pages instead of lines). This method sets a correspondence between a physical step (“increment”) and a discrete number of visible lines (“scrolling unit”). Thus, it’s a perfect match for coarse-grained scrolling devices that have distinct steps (like a typical mouse wheel or buttons). For such devices, the method is just as relevant today, as back then.

Because unit-based scrolling lacks precision and it’s hardly compatible with high-precision devices (such as touchpads or high-resolution mouse wheels), all the major OSes proceeded along roughly the same path — they allowed to report the scrolling delta as a floating point number, instead of an integer. That solved the lack of precision just fine. However, this well-intended decision had some unintended consequences, because it still preserved the original “increment / unit” bond, which makes this method not very suitable for devices that rely on continuous scrolling. Because the actual scrolling distance (in pixels) depends on a particular “unit” height, visual scrolling speed becomes inconsistent across different applications. Another problem is that, although the resolution can be arbitrarily high, there’s no way for an OS to tell what delta corresponds to 1 pixel, which is crucial, for example, for touchscreen-based scrolling. All in all, this approach is a wrong way to support touchpad-like devices — the only input device that matches such an API is “a high-resolution mouse wheel that reports precise deltas, but, at the same time, has distinct physical steps” (this device is conjectural, as high-resolution wheels usually have no notches). Keep in mind that the high-precision angular (or “relative”) deltas can emulate the original coarse-grained deltas, if necessary. Most modern widget toolkits can make use of the high-precision input, e.g.: Windows GUI, Cocoa, GTK+ 3, Qt 5 (with Java Swing as a notable exception).

A better API for devices that offer continuous scrolling (for example, touchpads) is to measure scrolling directly in pixels — in this way, the resulting distance doesn’t depend on particular content metrics, and it’s under complete control of the OS. That’s how we can attain consistent visual scrolling speed across different applications. Please note that it’s not about increasing the precision, it’s about changing the interpretation of the value, which requires special support both at the level of OS and at the level of widget toolkit. Because the concept of pixel-precise (or “absolute”) positioning stretches the existing mouse-based APIs, not all operating systems / toolkits support it so far. The “absolute” positioning only complements, not replaces the existing “relative” APIs, because for step-based devices it makes perfect sense to map the steps to content-dependent units (like lines or rows) and such a mapping cannot be expressed in pixels, as the input subsystem is unaware of where and how the generated events will be used.

Currently only Mac OS has a sane API that supports both absolute- and relative scrolling deltas. Windows supports pixel-precise scrolling only in the combination of Precision Touchpad driver and Direct Manipulation API (yet Direct Manipulation is too all-encompassing and is never meant to used just for input). In Linux, Wayland can generate pixel-precise deltas, while X server still lacks that kind of an API.

As for cross-platform widget toolkits: Qt’s QWheelEvent has separate angleDelta and pixelDelta properties. Although GTK+ handles absolute deltas under the hood, it does that without a well-defined API. In JavaFX, ScrollEvent distinguishes between pixel-based and line-based scrolling values, while in Java, MouseWheelEvent contains only “rotations”.

In web browsers, DOM Level 3 wheel event explicitly separates pixel / line / page deltaMode (the API is now W3C‘s Working Draft).

Pixel deltas are also good for supporting touchscreen scrolling (which, obviously, requires pixel-precise positioning) without a separate API. For example, that’s what Mozilla does by converting touchscreen-specific WM_GESTURE messages to the general-purpose wheel events.

3.5.2 Scrolling acceleration

Scrolling input acceleration (don’t confuse with the rendering acceleration) is a feature that adjusts input scaling depending on the input velocity. For example, without accelerations, 1 cm finger movement always results, say, in 100 px content movement, regardless of the speed (the relationship between the input and the output is linear). With acceleration, however, slow 1 cm finger movement might result in only 10 px content movement, while fast 1 cm finger movement might result in 1000 px content movement (thus, the relationship is non-linear). The feature mimics acceleration in physics, so that the user input controls not just velocity, but also the rate of change of the velocity.

The resulting dynamics depends on various OS-specific acceleration parameters. For instance, here’s the observed data from Mac OS X Sierra (code for acquiring the data):

Scrolling acceleration

Built-in OS settings can usually adjust only the acceleration rate, while more advanced utilities might provide a way to fine-tune the acceleration curve directly (example). In some cases, the parameters of scrolling acceleration tied to parameters of the mouse pointer acceleration. In other cases, for example in Linux, pointer acceleration doesn’t cover the scrolling acceleration — tests (code) show that there’s no scrolling acceleration in Xorg.

What’s the purpose of scrolling acceleration? Here are the main advantages:

  • It’s good for scrolling through large distances (which otherwise would be rather tiresome).
  • It can boost the positioning precision on slow scrolling (like “Enhance Pointer Precision” does for the pointer motion).
  • In combination with pixel-precise positioning, scrolling acceleration can be used to implement “pixel-perfect” scrolling — so that the initial delta is guaranteed to be 1 pixel, regardless of the particular hardware precision. In a sense, this is a way to unleash the pixel-precise positioning API in practice. This also reduces the input latency, because the initial hardware event always results in actual scrolling (without preliminary 1 pixel delta accumulation).

Despite the advantages, scrolling acceleration complicates hand-eye coordination and makes it harder for the user’s brain to predict the result of scrolling. Just like with the pointer acceleration, proper implementation is crucial to achieve the predictability.

Generally, it makes sense to apply the acceleration only to devices with continuous scrolling (e.g. touchpads, free-floating wheels, etc). Step-based devices (like a typical mouse wheel) rely on the tight increment / unit bond, which is disrupted by the scrolling acceleration so the result might feel rather unnatural. For example, because Mac OS applies scrolling acceleration to all mouse wheels (Apple mouses are “free-floating”, you know) people that use non-Mac mouses are often frustrated (that’s a very good description of the problem, by the way). Some users even write petitions asking Apple to disable the acceleration for “external” mouses — all in all, a more flexible configuration would be definitely useful (third-party USB Overdrive driver offers that flexibility).

In principle, if a particular OS doesn’t support scrolling acceleration, this feature can be emulated at the level of application (yet there’s no way to distinguish between the input sources). Ideally, scrolling acceleration should be configured system-wide to ensure the uniformity of user experience.

3.5.3 Kinetic scrolling

Kinetic (or “momentum”) scrolling emulates inertia of physical objects and continues the movement for some time after immediate scrolling events are stopped. The inertial scrolling is stopped either by user or by “friction”. Kinetic scrolling feels quite natural and, together with the scrolling acceleration, makes it easier to scroll through large distances.

Kinetic scrolling is triggered by a “flick” gesture when user lifts the fingers, so the feature is applicable only to touchpad / touchscreen scrolling.

Here’s the observed data from Mac OS X Sierra (note that, in the beginning, the image incorporates the acceleration curve from the previous chart, which constitutes the direct user input):

Kinetic scrolling

Interestingly, due to a combination of the initial acceleration and the subsequent “coasting”, the resulting trajectory closely resembles the cubic ease in / out curve which we examined earlier (in application to mouse wheel).

Let’s take a closer look at the scrolling velocity (i.e. at the first derivative of the distance, or at how scrolling delta had been changing over time):

Scrolling velocity

As we can see, the second part of the chart is much more “smooth” — that’s because it contains synthetic, generated data (as opposed to the raw user input).

One way to implement kinetic scrolling is to do that at the level of operating system. For example, Mac OS itself might continue to send scrolling events for some time after user-generated input stops. In Cocoa API applications can rely on momentumPhase property to detect kinetic scrolling. At higher level of abstraction, for example, in Java, those events are indistinguishable from the user-generated input (which is OK, because all the necessary processing is already performed by the OS). Having system-wide support of kinetic scrolling is good for the consistency of user experience.

Windows’ Precision Touchpad driver also aims to support kinetic scrolling system-wide, however, the actual results can be very inconsistent because of the relatively poor application compatibility with high-resolution scrolling in general. If you use a custom touchpad driver, support of kinetic scrolling is at the whim of manufacturer’s fancy.

In Linux, the xf86-input-synaptics Xorg driver implements kinetic scrolling natively (as coasting), while xf86-input-libinput driver doesn’t support this feature by design.

Another way to implement kinetic scrolling is to do that at the level of widgets. This might be necessary when particular OS doesn’t support kinetic scrolling directly. For example, both GTK+ and Qt support the functionality at the level of toolkit: GtkScrolledWindow has a method to enable / disable kinetic scrolling, while Qt has QScroller class that enables kinetic scrolling for any scrolling widget. This approach is more flexible (because clients can differentiate the events), but potentially less consistent — each toolkit might implement the behavior differently (gesture detection, deceleration curve, etc).

If a particular application doesn’t rely on the standard scrolling widgets (e.g. a web browser) and OS doesn’t support kinetic scrolling natively, the only way to add the scrolling inertia is to implement it separately, at the level of application. Although, conceptually, this is relatively easy to do, the task requires processing of low-level “touch” events, because fingers should be lifted off the device for the inertial scrolling to start and user should be able to stop the scrolling simply by placing fingers back on a touchpad / touchscreen. Some libraries, like libinput, define special scroll sources and generate dedicated high-level events that help to implement kinetic scrolling.

3.5.4 Elastic scrolling

Elastic scrolling allows us to temporarily scroll past the edge of the content using a touchpad and to “bounce back” when we remove fingers from the touchpad. This action displays the content at positions that are outside of scrolling model and exposes underlying background:

Elastic scrolling

At the first glance, you might think that this feature is just a visual gimmick that has nothing to do with input as such. However, elastic scrolling helps to maintain a consistent mapping between finger- and content positions, and thus to provide a more predictable control and to create a feeling of “firm grip” on the content. Without this feature, excessive scrolling is simply discarded and when we reverse the direction, scrolling is started immediately, from arbitrary position of the fingers — this corresponds to slipping in the physical world, and, just like real slipping, this reduces the steerability.

The only desktop OS that offers elastic scrolling is Mac OS, which introduced the feature in OS X Lion. Existing alternative implementations fall short: GtkScrolledWindow can display visual “overshoot” indication when the content is pulled beyond the end, however the finger / view position correspondence is not preserved; likewise, Precision Touchpad driver in Windows 10 has “touch bounce” feature that yanks window in the direction of overscroll, but that’s just an animation without any input adjustments, moreover, because that animation is applied to the whole window, its effect might be too indirect and obtrusive (there are many guides on how to disable touch bounce in Windows).

In principle, it’s possible to implement elastic scrolling at the level of application, though this would require handling of low-level touch events to detect finger contact — plain scrolling events are not enough. A potential drawback of supporting this feature at the application level is a loss of consistency, because such an application might behave differently from other programs. Widget toolkit is a different story — when toolkit is used as a foundation of OS graphical interface (e.g. in Linux), elastic scrolling can be supported system-wide and so the effect can be (more or less) consistent.

4. Rendering

Although high-resolution input is good to increase positioning precision, reproduce dynamics and reduce latency, it’s rendering that ultimately determines the actual “smoothness” of scrolling. Rendering of scrolling has several unique features that worth detailed examination.

4.1 Performance

The first consideration is rendering performance — application must be able to generate enough frames per second (FPS) for the motion to be perceived as smooth. The temporal sensitivity of human visual system differs between individuals, yet, on average, it takes 240 – 250 FPS for the motion to be considered “realistic”. In practice, feasible frame rate is limited by display refresh rate. The refresh rate of a typical LCD monitor is only 60 Hz, however, displays that support higher rates (e.g. 144 Hz, up to 240 Hz so far) are becoming widespread. Thus, the rendering should aim at least at 60 frames per second (or somewhat higher when V-sync is disabled). Ideally, we need to deliver up to 240 FPS if we are to unleash the full potential of contemporary monitors.

Can a typical desktop application deliver such a frame rate? It is highly unlikely — even computer games often struggle to reach 60 FPS, let alone go beyond that. You might say that computer games are more complicated than desktop applications, but then again, computer games are supposed to be run on powerful specialized hardware that meets “minimum requirements”, with modest screen resolution and reasonable settings to ensure quality / performance balance. Desktop applications, on the contrary, can be run even on a low-powered machine, that is, nevertheless, connected to a 4K-5K display, and there’s no automatic setting fine-tuning. Moreover, game engines are designed with throughput in mind, while most desktop programs have nothing to do with the notion of “FPS”. That worked just fine in the past, but the rise of smooth / high-precision scrolling changed what is considered a sufficient rendering performance of desktop application.

If rendering in a particular program is simple enough to guarantee a decent frame rate, then such a rendering can be used directly, but applications that display more complex graphics can benefit from a neat trick known as “blit acceleration”.

4.2. Blit-acceleration

Scrolling is a very specific type of animation — major portion of the image is simply moved to a new location “as is”, without any modifications. Can we take advantage of this peculiarity to speed up the rendering? It turns out that yet, we can.

Since PDP-10 computers have so called block-transfer instructions to efficiently copy one region of memory to another. In the field of computer graphics, this functionality corresponds to bit blit which allows to combine multiple bitmap images together. The procedure is usually performed by GPU, directly in video memory and it’s extremely fast. Thus, it’s possible to reuse the major part of the image and to paint just the recently exposed part, without re-drawing the whole area.

This technique is widely supported, both at the level of OSes and at the level of widget toolkits. The example of the former approach is Win32’s ScrollWindow (or ScrollWindowEx) function which scrolls contents of the specified window’s client area. The example of the latter approach is Java’s JViewport which uses BLIT_SCROLL_MODE by default. Most standard scrolling widgets nowadays support blit-acceleration out of the box (for instance, see how scrolling works in Cocoa). Besides, it’s also possible for an application to perform bit blit directly, in case of custom scrolling implementation.

After a part of the content is moved, the “exposed” part needs to be re-painted, which is usually done via a standard painting mechanism in OS / toolkit (e.g. WM_PAINT message in Windows). In such a case, application needs to take actual “update region” (e.g. GetUpdateRect) into account ant to perform a partial rendering (yet, for simplicity, many applications just re-draw everything regardless of the update region). To maximize the effectiveness of the technique, application must be able to efficiently partition its content area and to draw elements only within given clip region. When it comes to text content, it’s usually easier to isolate a number of discrete lines rather than a specific segment in all the visible lines, so vertical scrolling gets a bigger boost than horizontal one (which is fine, considering the typical use cases).

Interestingly, hardware acceleration of scrolling is exactly how early “side-scrolling” video games managed to compensate for the poor graphics performance of PCs in the early 1990s. Moreover, the “ancient approach” closely resembles what is now considered a “modern approach” (which we’ll discuss later).

4.3 Double buffering

The idea of blit-acceleration looks so useful and elegant, right? OK, here’s the problem: as the existing image is stored in framebuffer, the result of the initial copying will be displayed immediately, possibly much earlier than the subsequent rendering is finished (i.e. those two actions are not “atomic”).

As a consequence, the scrolling can exhibit noticeable “flicker” (even though the partial update is double-buffered). This effect looks almost like screen tearing, except that genuine screen tearing splits the image only horizontally (because of how transmission over video interfaces works), while blit-copying can also split the image vertically (besides, the usual screen tearing cannot be captured using a screenshot — as it’s manifested in the display, not in the framebuffer). When there’s no V-sync (and so the ordinary screen tearing is also present), we might observe “double tearing” (horizontal + horizontal or vertical + horizontal) and that hardly adds to the visual “smoothness”.

Surprisingly, many built-in scrolling mechanisms do nothing to prevent that kind of flickering — for example, if we take Windows’ example on how to scroll a bitmap and add a delay in WM_PAINT handler to emulate complex rendering (code), we can observe that flickering in all its “charm” (and once there was a similar issue in Firefox):

Screen tearing

How can we circumvent this effect? An obvious way is to reorder the sequence of operations that constitute double buffering like that:

  1. pre-render the partial update to the back buffer,
  2. shift a part of the existing image using a bit blit,
  3. copy the pre-rendered image to the framebuffer.

By swapping the first two steps we can minimize the pause between the image shift and the partial update. Particularly, this approach is used in GTK 2.

Yet, if you think about it, such a solution is imperfect — it only minimizes the probability of flickering, not eliminates it. When V-sync is enabled, we can time the steps to reduce the probability more reliably, but when there’s no V-sync, the flickering is still quite possible as there’s no true “atomicity”.

The proper way to attain the atomicity is to maintain a “persistent back buffer”. The typical implementation of double buffering doesn’t preserve the back buffer content between the updates — each update draws over the existing content, so that the back buffer content might resemble something like (code that generated the image):

Back buffer

As a result, the typical double buffering cannot be applied to scrolling. However, in principle, it’s possible for the back buffer to mirror the content in the frame buffer (simply by performing each drawing at a proper position), and in such a case, we can do the following:

  1. shift a part of the existing image in the back buffer using a bit blit,
  2. render the partial update to the back buffer,
  3. copy the back buffer content to the framebuffer.

That’s how we can guarantee that the resulting image will be presented at once. This approach is used, for example, by OpenJDK’s JViewport which relies on so-called true double buffering. Another example is Apple’s NSClipView which performs scrolling in a “backing store”.

This method also solves the case when there’s a static background image that should not be moved on scrolling (so, we cannot perform a blit in the framebuffer) — in such a case, we can utilize alpha compositing to support independent layers.

A modified version of that approach uses a back buffer that is larger than the visible area, so it’s possible to pre-render the partial update without shifting the existing image at all (at least, for some time) and then to copy the result to the framebuffer, starting from a new position. This method is used mostly by web browsers, because it goes beyond how typical widget toolkits work (it seems that only GTK 3 has adopted this approach so far). Additionally, this technique makes it possible to pre-render yet-to-be-visible parts of the image ahead of time (for example, when user grabbed scrollbar’s thumb / placed fingers on a touchpad, but not yet started scrolling). In principle, it’s even possible to show the image that is incompletely rendered (or rendered at a lower resolution) — to increase the smoothness at the expense of so-called checkboarding.

4.4 Asynchronous rendering

So, now we know all about how to do the rendering, but… when? Surprisingly, “when” is an important topic in itself.

The naive way is to render the incoming scrolling events synchronously, upon receipt, and that’s exactly what most standard scrolling widgets do. Although this solution is great from the point of simplicity, it is seriously lacking from all other standpoints.

The first problem is that, as we already know, the resolution of scrolling events might be either too low or too high for the direct rendering. The both outcomes are equally undesirable:

  1. When the frequency of scrolling events is lower than the display refresh rate, the resulting motion will be perceived as “jerky”. According to the Wikipedia article on frame rate, humans can process up to 15 images per second individually, and, “anything less than 46 frames per second will strain the eye”. As the ultimate goal is 240 FPS and the reasonable goal is, at least, the usual technical limitation of 60 FPS, the rendering frame rate cannot depend on the frequency of input events.

  2. When the frequency of scrolling events is too high, the final outcome depends on a particular machine performance. If the hardware is capable of rendering frames at the higher than needed FPS, then the excessive frames will be (potentially) discarded, yet the CPU / GPU / power consumption will exceed what is really necessary. On less powerful hardware, the outcome is even less favorable — although most OSes / widget toolkits “coalesce” painting request (for example. see consolidation of WM_PAINT messages), very few of them can merge scrolling requests. As a result, the queued scrolling events will be processed one by one, but at a slower speed, so that the rendering will “lag and freeze” (remember about the compatibility of legacy applications with high-resolution events?).

While the second scenario can be addressed by limiting the frequency of incoming scrolling requests at the level of application (by merging them), the first issue is more fundamental — it’s a typical problem from digital signal processing (DSP), when we need to increase the sampling rate at the output of one system so that another system operating at a higher sampling rate can input the signal. That problem has a typical solution, which is interpolation. The asynchronous scheme works like the following:

  1. Store the incoming scrolling request in a buffer.
  2. Set a timer to schedule the rendering at fixed intervals (say, 8 ms).
  3. Use interpolation to estimate the scrolling position at intermediate points.

That’s how we can smooth out the input and reduce the resolution of disparate input sources to a common denominator — the frequency of rendering (this also prevents beats caused by frequency difference between the event rate and the frame rate):

Interpolation

Please note that, in this context, “asynchronous” means only that we’re processing the events / rendering the frames separately — that doesn’t necessarily imply a separate thread (though, as we will see later, such a thread can be beneficial).

So far so good, this scheme indeed performs much better than the naive synchronous rendering, but here comes a catch question — what is the ultimate frequency of rendering? It turns out that we can not know beforehand: while typical computer monitors use something around 60 Hz, the exact number might vary from 58 Hz to 60 Hz; laptops might drop the refresh rate to 50 Hz in power-saving mode; gaming monitors support refresh rates up to 240 Hz (so far)… you get the idea.

The display is a thing-in-itself that updates its image at a fixed rate which we cannot control (at the level of application), so we need to “feed” it new frames at very specific intervals, to match the actual moments of the screen updates. The only reliable way to do this is to use vertical synchronization (V-sync). However, vertical synchronization is applied to the whole screen and, unless application runs in a full-screen mode, V-sync is controlled by OS / video driver / window manager — some configurations have it, some don’t.

When V-sync is disabled, the best thing we can do is just that — to set a timer and to render new frames at some reasonable rate (usually, somewhat higher than 60 Hz). In such a case, the display might show information from multiple frames and there will be screen tearing. The tearing is less noticeable on vertical motion (which is our primary use case). Besides, we can reduce the manifestation of this effect by increasing the rendering frequency (so that the screen will have several narrower tears instead of a single wider one). Apart from the tearing, there’s a problem with timing — because the moment of the image rendering differs from the moment of showing the result, the approximated scrolling position might be slightly inaccurate, which lowers the fine-grained motion uniformity. Despite these issues, the absence of V-sync has its advantages: the frequency of rendering can be adjusted continuously and there’s no impact on the latency.

When V-sync is enabled, the story is very different. First, we better be able to keep up with the display refresh rate — because of how V-sync works, it’s only possible to do the rendering at fractions of the display refresh rate (so, if, for example, the performance is enough only for 59 FPS, we will end up with 30 FPS as a result). Obviously, that’s a serious concern, and that’s why some video adapters offer “adaptive V-sync” option that will only turn on vertical synchronization when the frame rate of the software exceeds the display’s refresh rate (and that’s one more reason to exceed the refresh rate when we use a timer). A complete solution is adaptive synchronization technologies, such as FreeSync and G-Sync, which adapts the display refresh rate to the rendering (this requires support from both the video adapter and the display). That said, we cannot expect any of these technologies to be available in each particular case, so we should be able to handle possible V-sync decently, as is.

Another fundamental problem with V-sync is that, even if we determine the actual refresh rate, we cannot rely on a timer to synchronize the rendering with the display frames perfectly — the resolution of programmable timers is usually insufficient and also configuration dependent (e.g. it depends on whether HPET is present / supported). The imperfect synchronization leads to discarded /skipped frames, which reduces the fluidity of animation. Surprisingly, human eye is very sensitive to small irregularities, so even a single dropped frame is quite noticeable (the effect is known as “jank”). Here’s a very good video that explain the problem in detail.

If interval timer is not enough, what should we do? The answer is V-sync-driven timer — we need an API that tells us when to render the next animation frame, so that the display will fetch it in time, regardless of particular refresh rate. Modern web browser has such an API, which is requestAnimationFrame. This API allows to register a callback that will be called either in-sync with the display refresh rate (when V-Sync is on), or by timer (when V-Sync is off). Additionally, the callback receives a sub-millisecond frame timestamp that can be used, instead of the current time, to better estimate the animation state (in our case — the interpolated scrolling position).

Some GUI toolkits offer similar APIs: GTK has GtkFrameClock, Qt Quick scene renderers can rely on OpenGL support of V-sync, JavaFX provides AnimationTimer class. Still, many toolkits (e.g., Java’s AWT / Swing) lack such a support, which makes producing truly smooth animation impossible.

We can improve the scheme even further — because waiting for vertical synchronization blocks the rendering thread, which, in many cases, is also used to process the input events, we can avoid the unnecessary blocking and increase responsiveness by performing rendering in a separate thread. For instance, that was a problem in Firefox, which is now solved by asynchronous panning and zooming (APZ) — the reports are quite encouraging. Another example is Qt Quick threaded renderer loop. Windows Direct Manipulation API also claims that “To optimize responsiveness and minimize latency, Direct Manipulation processing occurs on a separate, independent thread from the UI thread. As a result, output transforms can run in parallel to activity on the UI thread.”

When implemented properly, scrolling animation looks substantially more “smooth” when V-sync is enabled. Because scrolling motion is relatively monotonous and lacks distinct phases, V-sync-induced latency is usually not an issue. However, because V-sync is applied system-wide, other activities (such as typing) might be affected. Ideally, it should be possible to selectively bypass V-sync on latency-sensitive screen updates. Adaptive synchronization technologies, such as FreeSync or G-Sync, is an even better solution that offers the best of both worlds (though it’s not yet widespread).

5. Display

Display is probably the less-adjustable element in the whole chain. However, certain display characteristics might influence scrolling rather significantly.

One property that worth mentioning is refresh rate. As display’s refresh rate is the upper limit of actual frame rate, it determines the maximum possible smoothness of scrolling animation. Although most typical computer monitors are stuck at 60 Hz, more capable models are becoming more and more widespread: today’s gaming monitors offer at least 144 Hz, while high-end specimens can boast 240 Hz (i.e. the threshold of “real-life” motion). What is the practical frame rate limit? That’s no single answer to that so far: Wikipedia says) “240 FPS when played back at normal speed on a 240 Hz monitor is also near the limits or about of perceivable smoothness”, but you might also check this article or that article (yet, keep in mind, that visual system of some people might be especially sensitive, so better be safe than sorry).

Another thing is LCD motion artifacts, including:

These artifacts might appear as blur, dividing or flickering (in any combination). We’re not going to delve deep into the specifics, suffice it to say that display properties can affect scrolling big time, and all displays are not created equal.

6. Configuration

Although both high-precision scrolling and smooth scrolling are surely nice, there are several use cases that require special consideration:

  • Slow CPU / GPU — in such a case, the hardware might be unable to attain sufficient frame rate, regardless of the rendering optimizations. Since animation is destined to be “janky” anyway, we can resort to the coarse-grained mode, which, in these conditions, might perform better (because it’s at least predictably “janky”).

  • Power saving mode — when a low-power profile is activated, we can be pretty sure that either the battery is low, or the user wants to save the battery. In any case, it makes perfect sense to switch to the coarse-grained scrolling to consume less CPU cycles and thus to reduce the power consumption.

  • Remote desktop — when RDP, VNC or other similar technology is in use, the smoothness of animation will be inevitably degraded during the transmission. In spite of that, a lot of CPU resources will wasted on both sides to produce / encode / decode the image (in addition to overconsuming the network bandwidth). When remote desktop is active, it’s wise to fallback the old school scrolling in order to improve the usability and to avoid wasting the resources unnecessarily.

In all these cases scrolling can be adjusted automatically, without user intervention. However, in some instances, user might want to have a direct control. For example, a particular display might exhibit too much ghosting / overdrive artifacts, or, user might choose coarse-grained scrolling simply because of personal preference (keep in mind, that human perception really differs). For such occasions, it’s prudent to have a “Smooth Scrolling” checkbox in OS / application settings (for simplicity, it should probably affect both the interpolation and the processing of high-precision events simultaneously).

7. Summary

Let’s summarize the main findings to get the big picture:

Tip Rationale
Clearly distinguish between smooth scrolling and high-precision scrolling. To offer “smoothness” independently of the input precision.
Use pixel-accurate scrolling model. To enable high-resolution- and smooth scrolling in principle.
Prefer a mouse with high-resolution sensor & high USB polling rate. For more fine-grained control of scrollbar thumb.
Capture pointer motion with subpixel precision (when proper API is available). To overcome the limitation of screen resolution.
Store scrollbar position with subpixel precision. To actually benefit from the two previous recommendations.
Use contemporary APIs to acquire high-precision scrolling events. To go beyond the legacy coarse-grained input.
Distinguish between angular- and pixel deltas (when possible). To correctly handle both step-based input devices and devices that offer continuous scrolling.
With virtualization, capture high-precision events in host OS and retranslate them in guest OS. To provide high-precision input for the guest machine.
Animate / interpolate all kinds of input (from scrollbar, mouse wheel, trackpad, keyboard, navigation, etc.). To compensate for the insufficient spatial- and temporal resolutions of the typical sources.
Apply ease-in & ease-out curves. To make the motion look more natural.
Provide scrolling acceleration / kinetic scrolling / elastic scrolling (when appropriate). To improve control accuracy and to facilitate long-distance scrolling.
Use blitting (or a special buffer) to speed up the rendering. To generate enough FPS even on less powerful hardware.
Make sure that the output is atomically double buffered. To prevent rendering-related tearing.
Process the events and render the frames asynchronously. To generate frames at a frequency that makes the motion smooth and that doesn’t depend on the frequency of input events.
Use V-sync or adaptive synchronization technology (preferable). To avoid refresh-related tearing.
When V-sync is enabled, rely on a system API to properly synchronize the frame generation. To attain perfect timing and to avoid “jank” (owing to skipped frames).
Perform the rendering in a separate thread. To prevent interference from other activities and to avoid waiting for possible V-sync.
Use a display that supports high refresh rate. To make the resulting motion more smooth.
Prefer a display with minimal motion blur / ghosting. To reduce the image artifacts.
Switch to coarse-grained scrolling when hardware performance is insufficient, when remote desktop is active or when power saving is in use. To improve the usability and to save the resources.
Provide an option to control high-resolution / smooth scrolling. To account for individual preferences.

Some of these recommendations can be implemented directly in application, some parts require support at the level of OS, and some actions can be taken by ultimate users.

Most findings are tested in practice — it’s possible not only to overtake, say, Safari, but to actually surpass it. With the right approach, this can be done in any toolkit (even in such seemingly “unfit” one as Java Swing). Thus, no magic is required — technology is the key.

A lot of work has been done in the area of scrolling, yet, there are still many things that can (and should) be improved, so we can indeed scroll with pleasure.

See also:

  • Typing with pleasure — Human- and machine aspects of typing latency, experimental data on latency of popular text / code editors.

6 Comments

  1. Pavel says:

    The article is probably too large by today’s standards. The only feeback so far is from Raph Levien (the main author of xi-editor). By the way, his Rope science is a very good read.

  2. Someone says:

    This article is fantastic and can’t be large enough. If you are writing a mouse driver (not many do these days) it is an revelation. Keep on going with your brilliant work. Thank you!

  3. Přemysl says:

    Very in-depth and well written. Never enough of such articles.

  4. Tobi says:

    Is there any “smooth scrolling” implementation for JavaFX on Mac OS available?

  5. Pavel says:

    Both Java and JavaFX platform-specific implementations don’t relay high-precision scrolling events (with the exception of angular deltas in Windows):

    * in Mac OS, deltaY and deltaX are used instead of scrollingDeltaY and scrollingDeltaX;
    * in Linux, legacy button events are used instead of XInput 2.x valuators;
    * in Windows, only (angular) WM_MOUSEWHEEL events are used, not Direct Manipulation API pixel-precise events.

    In Java, MouseWheelEvent can represent only angular (albeit, high-precision) deltas. In JavaFX, ScrollEvent includes pixel deltas as well.

    In Java, JScrollPane ignores high-precision scrolling events (both angular- and pixel ones). In JavaFX, ScrollPane can handle high-precision events correctly.

    Neither toolkit can animate / interpolate scrolling.

    As a result: there’s no high-precision scrolling in Java (AWT / Swing), while JavaFX has a limited high-precision scrolling in Windows (you can check this by running an example together with HiResScrollWheelEmulator). And even then, because there’s no interpolation, visual “smoothness” will depend solely on the actual resolution of particular hardware events: with certain touchpads / high-resolution mouse wheels, scrolling will be more or less smooth (but not truly smooth). Besides, there will be no smooth scrolling at all when an “ordinary” mouse wheel is used or when a touchpad / mouse driver doesn’t relay high-precision WM_MOUSEWHEEL messages (which is a very typical case in Windows).

    For high-precision scrolling, we need to correctly relay high-precision events in each platform-specific runtime (besides, in the case of AWT / Swing, we also need to add pixel deltas to MouseWheelEvent and to employ high-precision events in JScrollPane). For smooth scrolling, we need to support interpolation in the standard scrolling widgets.

  6. Pavel says:

    Comments on Reddit: in /r/programming

    By the way, Apple just announced a new iPad Pro that “delivers refresh rates of up to 120Hz for fluid scrolling” and “reduces power consumption by automatically adjusting the display refresh rate to match the movement of the content” – that’s definitely a step in the right direction.

Leave a Reply