Skip to content

データ構造

Osamu Ishimura edited this page Aug 3, 2019 · 15 revisions

ncclComm_t

src/nccl.h.inで

/* Opaque handle to communicator */
typedef struct ncclComm* ncclComm_t;

と定義されており、実体は src/include/comm.hに定義されているncclComm構造体のポインタ。

ncclComm構造体の定義は次の通り

struct ncclComm {
  struct ncclChannel channels[MAXCHANNELS];

  struct ncclPeerInfo* peerInfo;

  void* bootstrap;

  int rank;    // my rank in the communicator
  int nRanks;  // number of GPUs in communicator
  int cudaDev; // my cuda device index
  int nvmlDev; // my NVML device number

  enum { GROUP, PARALLEL } launchMode;
  cudaStream_t userStream;
  bool userStreamSet;
  cudaEvent_t doneEvent;
  bool checkPointers;

  // Counter to make sure collectives match (needed for bcast/reduce
  // where syncs are not symmetric).
  uint64_t opCount;

  // Channels for collectives
  int nChannels;
  int nThreads;

  // Low-latency algorithm threshold
  ssize_t llThreshold;
  ssize_t threadThreshold;

  // Tree algorithm threshold
  ssize_t treeThreshold;

  // An internal CUDA stream for NCCL kernel CGMD launches
  int groupCudaStream;
  cudaStream_t groupStream;

  // Whether there has been a fatal error in this communicator.
  ncclResult_t fatalError;

  // Error reported by GPU
  volatile ncclDevError_t* fatalDevError;

  // Flag to ask NCCL kernels to abort
  volatile uint32_t *abortFlag;

  // Device side of the communicator
  struct ncclDevComm *devComm;
  // Host copy of the devComm (to free CUDA allocs)
  struct ncclDevComm hostDevComm;

  // Intra-process sync
  int intraRank;
  int intraRanks;
  int* intraBarrier;
  int intraPhase;

  // Storage for deferred intra-process launch
  struct cudaLaunchParams * intraParams;
  struct cudaLaunchParams *myParams;
  int* intraCudaDevs;
  int* intraCGMode; // Whether we can use CUDA9 CGMD or not
  int* intraCC; // Only to check all have the same ComputeCap and disable CGMode if not
  struct ncclColl args;
  void* argsptr;

  // Global proxy thread
  pthread_t proxyThread;
  struct ncclProxyState proxyState;
};

当構造体で使われている非標準型は次の通り

ncclChannel構造体

src/include/devcomm.hに定義されている。

定義は次の通り。

struct ncclChannel {
  union {
    struct {
      struct ncclRing ring;
      struct ncclTree tree;

      int id;
      int nthreads;
      int buffSize;

      // Communication structures
      struct ncclPeer* peers;
      struct ncclPeer* devPeers;

      // Operation list for aggregation
      struct ncclColl* collectives;
      struct ncclColl* devCollectives;
      int collStart;
      int collCount;
      int collFifoHead; // Only used by GPU
      int collFifoTail; // Only used by CPU
    };
    int data[0x80];
  };
};

当構造体で使われている非標準型は次の通り

ncclPeerInfo構造体

ncclDevComm構造体

struct ncclDevComm { int rank; int nRanks;

// Flag to ask NCCL kernels to abort volatile uint32_t *abortFlag; volatile ncclDevError_t *fatalDevError;

// Channels, device side struct ncclChannel* channels; };

ncclColl構造体

ncclProxyState構造体

ncclResult_t型

src/nccl.h.inに定義されている。

次の6つが定義されている

  • ncclSuccess: 成功
  • ncclUnhandledCudaError: CUDAのエラー
  • ncclSystemError: システムエラー
  • ncclInternalError: 内部エラー
  • ncclInvalidArgument: 正しくない引数
  • ncclInvalidUsage: 正しくない用法
  • ncclNumResults: ?
Clone this wiki locally