August 2017

Can the performance of a message-oriented system be improved by adding periods of idle execution?

A period of time required for the scheduler to switch the CPU from one actor to another.

Spinning before a context switch potentially improves performance.

But not always.

An actor should delay the context switch only if it is likely that a message will arrive.

1. When should an actor spin?

2. How long should an actor spin?

There exists a simple upper bound.

d (delay) < c (context switch)

How to pick a delay in [0, c>?

Assume that P is the probability that a message arrives after some time d.

The actor also needs to occasionally sample P, i.e. spin several times to determine P.

We will assume that the actor samples with some probability φ.

Conclusion: given a probability p that a message arrives after δ, an actor should delay the context switch if:

δ < p

Before the context switch:

1. With probability φ << 1, spin c time.

2. Record the message arrival times t_{i}.

3. After collecting N samples,
group the delay times into M buckets,
and report the pairs δ_{i}, p_{i}.

Before a context switch:

1. Take the last reported M pairs δ_{i}, p_{i}.

2. Pick the pair with minimal δ_{i} - p_{i}.

3. If the difference is positive,
wait δ_{i} time before context switching.

For 95% confidence that the sampled probability P lies within 15% of its true value, we need n = 43.

For a sampling probability φ = 0.01, the expected number of activations per actor is ~4300.

But, actors are short-lived.

In many applications, most actors are not activated 4300 times.

How to obtain a sample more quickly?

Always set the sampling frequency φ to a value
proportionate to the running best probability p_{i}.

Intuition: If the message arrival probability P is high, then more frequent sampling is not detrimental.

I.e. an actor spends more time sampling, but it is likely that messages will arrive during sampling, saving a context switch.

If the message arrival probability P is low, then less frequent sampling is not detrimental.

I.e. it might take a long time to collect the sample, but it's unlikely that it will help anyway.

Initially, set the sampling frequency
to some small value φ_{0}.
After each sampling, set the frequency to:

Expected number of messages before obtaining a sample:

Lower bound on speedup:

Shams and Sarkar, AGERE 2014

Ping-Pong

Thread Ring

Fork Join Throughput

Use a spacebar or arrow keys to navigate