Introduction¶

The oneAPI Collective Communications Library (oneCCL) provides primitives for the communication patterns that occur in deep learning applications. oneCCL supports both scale-up for platforms with multiple oneAPI devices and scale-out for clusters with multiple compute nodes.

oneCCL supports the following communication patterns used in deep learning (DL) algorithms:

Allreduce
Allgatherv
Broadcast
Reduce
Alltoall

oneCCL exposes controls over additional optimizations and capabilities such as:

User-defined pre-/post-processing of incoming buffers and reduction operation
Prioritization for communication operations
Persistent communication operations (enables decoupling one-time initialization and repetitive execution)
Fusion of multiple communication operations into the single one
Unordered communication operations
Allreduce on sparse data

Intel has published an open source implementation with the Apache license. The open source implementation includes a comprehensive test suite. Consult the README for directions.