• Paolo Valente's avatar
    block: introduce the BFQ-v7r8 I/O sched for 3.4 · e60664ca
    Paolo Valente authored
    
    
    Add the BFQ-v7r8 I/O scheduler to 3.4.
    The general structure is borrowed from CFQ, as much of the code for
    handling I/O contexts. Over time, several useful features have been
    ported from CFQ as well (details in the changelog in README.BFQ). A
    (bfq_)queue is associated to each task doing I/O on a device, and each
    time a scheduling decision has to be made a queue is selected and served
    until it expires.
    
        - Slices are given in the service domain: tasks are assigned
          budgets, measured in number of sectors. Once got the disk, a task
          must however consume its assigned budget within a configurable
          maximum time (by default, the maximum possible value of the
          budgets is automatically computed to comply with this timeout).
          This allows the desired latency vs "throughput boosting" tradeoff
          to be set.
    
        - Budgets are scheduled according to a variant of WF2Q+, implemented
          using an augmented rb-tree to take eligibility into account while
          preserving an O(log N) overall complexity.
    
        - A low-latency tunable is provided; if enabled, both interactive
          and soft real-time applications are guaranteed a very low latency.
    
        - Latency guarantees are preserved also in the presence of NCQ.
    
        - Also with flash-based devices, a high throughput is achieved
          while still preserving latency guarantees.
    
        - BFQ features Early Queue Merge (EQM), a sort of fusion of the
          cooperating-queue-merging and the preemption mechanisms present
          in CFQ. EQM is in fact a unified mechanism that tries to get a
          sequential read pattern, and hence a high throughput, with any
          set of processes performing interleaved I/O over a contiguous
          sequence of sectors.
    
        - BFQ supports full hierarchical scheduling, exporting a cgroups
          interface.  Since each node has a full scheduler, each group can
          be assigned its own weight.
    
        - If the cgroups interface is not used, only I/O priorities can be
          assigned to processes, with ioprio values mapped to weights
          with the relation weight = IOPRIO_BE_NR - ioprio.
    
        - ioprio classes are served in strict priority order, i.e., lower
          priority queues are not served as long as there are higher
          priority queues.  Among queues in the same class the bandwidth is
          distributed in proportion to the weight of each queue. A very
          thin extra bandwidth is however guaranteed to the Idle class, to
          prevent it from starving.
    Signed-off-by: default avatarPaolo Valente <paolo.valente@unimore.it>
    Signed-off-by: default avatarArianna Avanzini <avanzini.arianna@gmail.com>
    e60664ca