SHArP - Scalable Hierarchical Aggregation Protocol (1.1.0)
-------------------------------------------------------------------------------

Copyright 2016 Mellanox.

License
-------------------------------------------------------------------------------

See LICENSE file.

Overview
-------------------------------------------------------------------------------
This document addresses system-level management of the Scalable Hierarchical
Aggregation Protocol (SHArP) resources. This includes system-wide resource
manager (Aggregation Manager - AM), SHArP Daemon (SD) which is local to each
compute node and provides accesses to switch-based collective communication
capabilities, libsharp, libsharp_coll - user level communication libraries.


### Terminology

* __AN (Aggregation Node)__:  ASIC hardware and local firmware implemented in
  Switch-IB 2.
* __Tree (Aggregation Tree)__: a tree of ANs that describes data reduction
  topology.
* __Job__: SHArP resources are allocated for a job.
* __Group__: A group represents collective operation.
* __AM (Aggregation Manager)__: system wide entity responsible for SHArP resource management.
* __SD (SHArP Daemon)__: local to each compute node responsible for connection
  establishment. SD#__n__  in that notation n – is rank
  of SD in the job.
  SD created the job has special responsibilities including communication
  with AM and resource management on job level. In current implementation, MPI rank 0
  initiates job creation, so SD#0 is the special SD.
* __libsharp API__ : a library (shared object) to instruct SD to perform actions.
* __libsharp_coll API__ : high level API exposes collective abstraction over SHArP.
* __SMX__ : communication library used for SD to AM and SD to SD messaging.
* __OST__ :  Outstanding Operation.
* __Group channel__ is a client process (MPI process) in the node selected for
  sending collective operation to assigned AN.
* __Radix__ is a number of children in the Aggregation Node.
  SwitchIB 2 limitsthe number by 64.
* __Child index__ is an index of group member in the list of node children.


### Aggregation Manager

The Aggregation Manager (AM) is a system management component used for system
level configuration and management of the switch-based reduction capabilities.
It is used to setup the SHArP trees, manage the use of these entities.

AM is responsible for:

* SHArP resource discovery.
* Creating topology aware SHArP trees.
* Configuring SHArP switch capabilities.
* Managing SHArP resources.
* Assigning SHArP resource on request.
* Freeing SHarP resources on job termination.

AM is configured by topology file created by Subnet Manager (SM): subnet.lst.
The file includes information about switches and HCAs.

Relevant parameters (AM):

* `fabric_lst_file`

Following the topology, AM discovers SHArP capabilities using MADs. During the
discovery, AM cleans SHArP resources allocated in AN.

Relevant parameters (AM):

* `clean_an_on_discovery`

Based on the topology, AM creates Aggregation Trees. Aggregation Tree is
a logical tree defines flow of collective operations. The communication capabilities
(QPs) between tree nodes are created between tree nodes during system initialization.

A user can configure pre-defined trees in AM. In the user-defined trees file,
the ANs are identified by the node names, as in the topology file created by the SM.
The file format is as follows:

```
tree <tree-id>
node {node description} [GUID:<port_guid_num>]
subNode {node description} [GUID:<port_guid_num>]
subNode {node description} [GUID:<port_guid_num>]
...
node {node description} [GUID:<port_guid_num>]
subNode {node description} [GUID:<port_guid_num>]
...
node {node description} [GUID:<port_guid_num>]
computePort {node description} [GUID:<port_guid_num>]
computePort {node description} [GUID:<port_guid_num>]
...
tree <tree-id>
node {node description} [GUID:<port_guid_num>]
````
See also [Trees Configuration Reference](doc/TreesConfigurationFile.md) .

Relevant parameters (AM):

* `trees_file`

AM computes Aggregation Trees automatically for quasi fat tree
topology based on user-defined root guids file.

Relevant parameters (AM):

* `root_guids_file`

For a new job launch, AM allocates SHArP resources. The resource allocation
includes two main steps:

* __Tree matching.__ AM selects an available tree which has non-broken subtree that spans
  all job hosts. For each host, AM assigns AN which which the host may form connection.
* __Resource allocation.__ AM sets resources for each AN which serves the job. This includes
  buffers, OSTs, maximum number of groups and QPs available for children connection.

 Relevant parameters (AM):

 * `max_tree_radix`
 * `max_quota`
 * `default_quota`

 A user application may ask specific amount of SHArP resources. An application can operate with
 OSTs, user data per group and number of groups. If any of these resources is 0, AM uses default value from its
 configuration file. OSTs, user data per OST and max radix are translated into a size of buffer that AM allocates for the job.
 AM can return to the application less resources than requested and even decline the resource allocation request. If there are no
 available resources for the job, HCOLL implements failback.

 Relevant parameters (HCOLL, SHARP_COLL):

 * `HCOLL_ENABLE_SHARP`
 * `SHARP_COLL_JOB_QUOTA_OSTS`
 * `SHARP_COLL_JOB_QUOTA_PAYLOAD_PER_OST`
 * `SHARP_COLL_JOB_QUOTA_MAX_GROUPS`
 * `SHARP_COLL_JOB_QUOTA_MAX_QPS_PER_PORT`

AM can read configuration parameters from command line, environment variables or configuration file.
AM supports following configuration parameters:

```
Aggregation Manager 1.1.0
-------------------------
Usage: sharp_am [OPTION]:

OPTIONS:
  -O, --config_file <value>:
	Configuration file
	default value: /etc/sharp/sharp_am.cfg

  -l, --log_file <value>:
	Log file
	default value: /var/log/sharp_am.log

  --log_verbosity <value>:
	Log verbosity level:
	1 - Errors
	2 - Warnings
	3 - Info
	4 - Debug
	5 - Trace
	default value: 2

  -V, --verbose:
	Run with full verbosity

  --log_max_backup_files <value>:
	Number of backup log files. Used for log rotation
	default value: 9

  --log_file_max_size <value>:
	Maximum size of a log file, in MBs
	If value is 0,log rotation isn't used
	default value: 64

  -B, --daemon:
	Run in daemon mode - sharp_am will run in the background

  -p, --pid_file <value>:
	PID file. Makes sharp_am to write its PID to the specified file when running as daemon
	default value: /var/run/sharp_am.pid

  -c, --create_config <value>:
	sharp_am will dump its configuration to the specified file and exit
	default value: (null)

  -t, --trees_file <value>:
	SHArP trees file
	If NULL, calculate trees automatically
	default value: (null)

  --max_tree_radix <value>:
	The maximum radix used in the system.
	default value: 64

  --clean_an_on_discovery <value>:
	Automatically clean all resources on aggregation nodes when discovered.
	default value: TRUE

  --fabric_lst_file <value>:
	Fabric LST file
	default value: ./fabric.lst

  --root_guids_file <value>:
	Root guids file
	default value: ./root_guid.cfg

  --dump_dir <value>:
	Path to dump files directory
	default value: .

  --generate_dump_files <value>:
	Dump internal state to files for debug and diagnostics
	default value: FALSE

  --max_quota <value>:
	Maximum quota that can be requested by a single job
	It is guarantee that no job will receive more than max quota
	Format: "(Trees-per-job, OSTs-per-tree, User-data-per-ost, Groups-per-tree, QPs-per-port-per-tree)"
	default value: (1, 500, 256, 500, 180)

  --default_quota <value>:
	Default quota to be requested for a single job
	The quota that will be requested for a job if no quota was requested explicitly
	Format: "(Trees-per-job, OSTs-per-tree, User-data-per-ost, Groups-per-tree, QPs-per-port-per-tree)"
	default value: (1, 16, 128, 8, 64)

  -g, --ib_port_guid <value>:
	GUID of the port to which aggregation manager binds to
	default value: 0x0

  --ib_max_mads_on_wire <value>:
	Maximum number of MADs that can be sent before waiting for respond
	default value: 100

  --ib_mad_timeout <value>:
	Maximum time [in milliseconds] to wait for MADs transaction to complete
	default value: 200

  --ib_mad_retries <value>:
	Maximum number of retries for timed out MADs transaction
	default value: 3

  --ib_am_key <value>:
	AM key
	default value: 0x0

  --ib_sharp_sl <value>:
	SL for SHArP control path communication (MADs)
	default value: 0

  --support_multicast <value>:
	Support return result by multicast
	default value: TRUE

  --ib_qpc_transport_service <value>:
	IB QP Context - transport service
	0 - Reliable connection
	1 - Unreliable connection
	2 - Reliable datagram
	3 - Unreliable datagram
	4 - Dynamically connected
	default value: 0

  --ib_qpc_use_grh <value>:
	IB QP Context - Use GRH for AN to AN communication
	default value: FALSE

  --ib_qpc_pkey <value>:
	IB QP Context - Partition Key for SHArP
	default value: 0xFFFF

  --ib_qpc_sl <value>:
	IB QP Context - SL for SHArP data path communication
	default value: 0

  --ib_qpc_traffic_class <value>:
	IB QP Context - Traffic class for SHArP
	default value: 0

  --ib_qpc_rq_psn <value>:
	IB QP Context - The transport Packet Sequence Number at which
	the remote end of the QP shall begin transmitting over the
	newly established channel. This value should be chosen to
	minimize the chance that a packet from a previous connection
	could fall within the valid PSN window
	default value: 0

  --ib_qpc_sq_psn <value>:
	IB QP Context - The transport Packet Sequence Number at which
	the local end of the QP shall begin transmitting over the newly
	established channel. This value should be chosen to minimize
	the chance that a packet from a previous connection could fall
	within the valid PSN window
	default value: 0

  --ib_qpc_rnr_mode <value>:
	IB QP Context - RNR mode
	0 - SHArP level resources does not apply for RNR
	1 - SHArP level resources apply to the IB transport RNR NACK
	default value: 0

  --ib_qpc_rnr_retry_limit <value>:
	IB QP Context - RNR retry limit
	The total number of times that the sender wishes the receiver to
	retry RNR NAK errors before posting a completion error
	default value: 0x7

  --ib_qpc_local_ack_timeout <value>:
	IB QP Context - Local ACK timeout
	Value representing the transport (ACK) timeout for use by the
	remote end.expressed as (4.096 µS*2Local ACK Timeout)
	default value: 0x1F

  --ib_qpc_timeout_retry_limit <value>:
	IB QP Context - Timeout retry limit
	The total number of times that the sender wishes the receiver to
	retry timeout, packet sequence, etc. errors before posting a 
	completion error
	default value: 7

  -h, --help:
	Show usage and exit

  -v, --version:
	Prints sharp_am version and exit

```
## SHArP Daemon

The SHArP Daemon is local to each node and is expected to persist as long as network available.
SD interacts with following entities:

 * AM. Job startup/termination.
 * SM. Service record fetching.
 * Other SD. Group creation and destruction.
 * libsharp communication library. Job/Group management.

Only SD#0 interacts with AM. The interaction is limited by sending resource allocation
request for a job, receiving job data and sending termination request. Job data distribution
between SD participating in the job is out of scope of SHArP software and has to be done in
MPI level using push API.
SD#0 is responsible for resource management on communicator level. SD#n>0 interacts with
SD#0 and requests resources for a group. For each group a fraction of available resources
can be allocated.
An user application can control resource allocation policy using the following environment variables:

* `SHARP_COLL_GROUP_RESOURCE_POLICY (1 - equal 2 - take_all by first group 3. User input percent)`
* `SHARP_COLL_USER_GROUP_QUOTA_PERCENT`

SD connects local MPI process to an Aggregation Tree. The connection is based on RC QP connected to nearest
AN. AM is responsible for the AN assignment to each compute port. The connection can be reused for
multiple collective operations. Each group should be joined to the Aggregation Tree before sending
collective operations. If multiple processes are participating in the group in the same node, HCOLL
can group these process based on socket locality and use multiple processes for sending collective
operations to network. Inside the sub-group, shared memory is used for collective. Group channel
process is a process selected for participating in sharp group. Application can ask a number of
group channels from AM. Multiple group channels affects tree radix and as result buffer allocation in AN.
If AN can't allocated asked number of group channels, MPI jobs fails. See [Multi-channel group].
Communication between MPI process (libsharp) and SD is based on UNIX domain sockets.

Detailed description for the flow between SD and MPI process can found in [sharp.h](src/api/sharp.h).

SD discovers AM address using Service Record fetching from SM.

SD has limited support for resiliency futures:

* If AM connection is broken, SD tries to reconnect to AM.
* SD#0 monitors MPI#0. If the process dies, SD#0 issues job termination request to AM.
  The monitoring is based on socket hangup status and doesn't requires CPU cycles.

For any job HCOLL issues two end job requests through SD#0 and last SD. The redundant
job termination request covers SD#0 crash.


### Inter-component messaging

SMX messaging library is responsible for communication between SHArP software components.
There are two communication protocols:

* AM <-> SD#0. This protocol is used on job level. It includes following messages:

   * SHARP_MSG_TYPE_BEGIN_JOB
   * SHARP_MSG_TYPE_END_JOB
   * SHARP_MSG_TYPE_JOB_DATA

SD#0 initiates connection to AM. SD discovers AM's address using service record. No special configuration
needed in production environment. For debug purposes, SMX_AM_SERVER environment variable can be used.

* SD <-> SD#0. This protocol is used on communicator level and includes following messages:

   * SHARP_MSG_TYPE_ALLOC_GROUP
   * SHARP_MSG_TYPE_GROUP_DATA
   * SHARP_MSG_TYPE_GET_JOB_DATA
   * SHARP_MSG_TYPE_RELEASE_GROUP

SD#>0 knows SD#0 address from job information distributed among SDs.

SMX wraps following underling communication mechanisms:

*   TCP socket. This is main communication mechanism used for production environment. A user
    has to configure at least one network interface.
*   Files. This mode serves debug and versification purposes.
*   UCX. This mode allows in-band message communication and uses [UCX - Unified Communication X library]
    (https://github.com/openucx/ucx). This is experimental mode and can't be used in production environment.

Relevant parameters (AM, SD):

* `smx_protocol`
* Environment variable: `SMX_SOCK_INTERFACE`
* Environment variable: `SMX_SOCK_PORT`
* Environment variable: `SMX_AM_SERVER`

### MAD communication

AM use ibis for high-performance, parallel processing: [ibis](https://github.com/Mellanox/ibis_tools).
SD is libibumad based application.

### APIs

SHArP includes two APIs:

* [libsharp_coll](src/api/sharp_coll.h) . This high-level public API
  available for third-party integration.
* [libsharp](src/api/sharp.h). This is low-level private API.

libsharp is interface library used for communication with local SD. UNIX domain socket is used for the communication.

Prerequisites
-------------------------------------------------------------------------------

 * SwitchIB-2 based fabric.
 * HPCx 1.6 bundle.
 * Following SwitchIB-2 FW and MOFED versions (and later) are sufficient:
	|SHArP version      |MOFED version         |SwitchIB-2 FW|
	|-------------------|----------------------|-------------|
	|v1.0               |MLNX OFED 3.3-x.x.x   |15.1100.0072 |
	|v1.1               |MLNX OFED 3.4-0.1.2.0 |15.1200.0076 |
 * MLNX OS 3.6.1002.
 * MLNX OpenSM 4.7.0 or later (available with MLNX OFED 3.3-x.x.x or UFM 5.6). [opensm.tgz](ftp://bgate.mellanox.com/upload/sharp/opensm_latest.tgz)
 * ConnectX HCA.
 * Kernel >= 2.6.22.
 * SHArP is compiled on following OS:

|Distro      |Platform |Kernel          |
|------------|---------|----------------|
|RHEL 6.1    |x86-64   |2.6.32-131.0.15 |
|RHEL 6.2    |x86-64   |2.6.32-220      |
|RHEL 6.3    |x86-64   |2.6.32-279      |
|RHEL 6.4    |x86-64   |2.6.32-358      |
|RHEL 6.5    |x86-64   |2.6.32-431      |
|RHEL 7.0    |x86-64   |3.10.0-123      |
|RHEL 7.2    |x86-64   |3.10.0-327      |
|RHEL 7.2    |ppc64le  |3.10.0-327      |
|Fedora14    |x86-64   |2.6.35.6-45     |
|Fedora16    |x86-64   |3.1.0-7         |
|Fedora17    |x86-64   |3.3.4-5         |
|Fedora18    |x86-64   |3.6.10-4        |
|SLES 11 SP1 |x86-64   |2.6.32.12-0.7   |
|SLES 11 SP2 |x86-64   |3.0.13-0.27     |
|SLES 11 SP3 |x86-64   |3.0.76-0.11     |
|Ubuntu12.04 |x86-64   |3.2.0-37        |
|Ubuntu13.10 |x86-64   |3.11.0-12       |
|Ubuntu14.4  |x86-64   |3.13.0-24       |
|Ubuntu14.4  |ppc64le  |3.13.0-32       |
|Ubuntu15.10 |x86-64   |4.2.0-16        |
|Centos6.3   |x86-64   |2.6.32-279      |
|Centos6.0   |x86-64   |2.6.32-71       |


 * SHArP is tested on:
     * Intel architecture: RHEL 7.2 (3.10.0-327).
     * PPC architecture (little-endian): Ubuntu 14.4 (3.13.0-32), Power8.

System configuration
-------------------------------------------------------------------------------

* Each compute node needs run local SHArP daemon.
* Only one instance of AM is allowed.
* AM and SM have to share the same server.

```
  +--------------------------------------+    +---------------------------------+
  |  Compute host                        |    | Dedicated server                |
  |                                      |    |                                 |
  |  +---------------+  +-------------+  |    |  +------------+  +-----------+  |
  |  | libsharp      +-->  SD         |  |    |  |  AM        |  |  SM       |  |
  |  +---------------+  |             |  |    |  |            |  |           |  |
  |  | libsharp_coll |  |             |  |    |  |            |  |           |  |
  |  +---------------+  |             |  |    |  |            |  |           |  |
  |  | hcoll         |  |             |  |    |  |            |  |           |  |
  |  +---------------+  |             |  |    |  |            |  |           |  |
  |  | MPI process   |  +-------------+  |    |  +------------+  |           |  |
  |  |               |  |    SMX      +----------->SMX        |  |           |  |
  |  +---------------+  +-------------+  |TCP |  +------------+  +-----------+  |
  |                                      |    |                                 |
  +--------------------------------------+    +---------------------------------+
```


Installation - Linux
-------------------------------------------------------------------------------

To build SHArP from source, the following tools are needed:
 * autoconf
 * automake
 * libtool
 * pkg-config

If you get the SHArP sources from github, you have to generate
configure script at first:

```shell
% ./autogen.sh
```

To build and install SHArP run following:

```shell
% module load mofed/hpcx
% ./configure --with-mpi=$OMPI_HOME --prefix=$PWD/install
% make install
% module unload mofed/hpcx
```

To compile debugging code configure the project with the --enable-debug option:
```shell
% ./configure --with-mpi=$OMPI_HOME --prefix=$PWD/install --enable-debug
```
SHArP can use Google Protobuf (proto3) for message serialization into human-readable form.
You can install Google Protobuf from
[Google Protobuf](https://github.com/google/protobuf)
Text serialization can be used only for debug and verification purposes. In production environment should be
used binary serialization.
```
% ./configure --with-mpi=$OMPI_HOME --prefix=$PWD/install --with-protobuf=<protobuf installation folder> --enable-debug
```

RPM installation:
```js
# rpm -ivh <sharp.rpm>
```

DEB installation:
```js
# dpkg -i <sharp.deb>
```

After installation the following daemons will be setup:

- *sharpd* - will be enabled in 3,4,5 rc
- *sharp_am* - will be disabled in all rc

For daemons manual installation/removing (requires root permission):
```
$prefix/bin/sharp_daemons_setup.sh or $top_source_dir/contrib/sharp_daemons_setup.sh
```

How to use the script:

```
Usage: sharp_daemons_setup.sh <-s SHArP location dir> <-r> <-a> <-d daemon>
    -s - Setup SHArP daemons
    -r - Remove SHArP daemons
    -a - All daemons (sharpd and sharp_am)[default]
    -d - Daemon name (sharpd or sharp_am)
```

Example of daemons configuration:

```js
# $prefix/sharp_daemons_setup.sh -s -d sharpd   # Setup sharpd daemon
# $prefix/sharp_daemons_setup.sh -r             # Remove both sharpd and sharp_am daemons
```

After setup procedure daemons startup scripts are:

```
/etc/init.d/sharpd
/etc/init.d/sharp_am
```

Daemons config files are:
```
/etc/sysconfig/sharpd and /etc/sysconfig/sharp_am
```

`$SHARP_OPTIONS` which was defined in sysconfig file will be passed to appropriate daemon as parameter

There is possibility to run SHArP daemons from nonstandard location by setting `SHARP_STARTUP_SCRIPT` env. var.

Example:
```js
# SHARP_STARTUP_SCRIPT=/my/script/location/sharp_am /etc/init.d/sharp_am start
```

Also `SHARP_DEVEL` may be set to force daemon script to use `$prefix` dir for `lockfile` and `pidfile` instead of default location (`/var/lock/subsys` and `/var/run` respectively).

Example:
```js
# SHARP_DEVEL=1 /etc/init.d/sharpd start
```



**Unit Test**

```
% make unittest
% make gtest
```

**Run Valgrind Test**

```
% make valgrind
```

**Run MPI Test**

```
% make runtest
```

Using libsharp
-------------------------------------------------------------------------------

To compile a package using libsharp, you need provide
CFLAGS and LDFLAGS. libsharp is integrated with pkg-config.
If you use pkg-config, you can use it for for getting compilation
flags:

```
PKG_CONFIG_PATH=<sharp destination folder> pkg-config --cflags sharp # prints CFLAGS
PKG_CONFIG_PATH=<sharp destination folder> pkg-config --libs sharp # prints LDFLAGS
```

Using SharpD
-------------------------------------------------------------------------------

SharpD executable is run as root. It can run as daemon (default)
or as a standard process (with -P option).

SharpD currently supports IB device discovery via umad and
does not use uverbs so does not handle local verbs events.
As such, it does not handle LID change event so currently,
SharpD _must_ be started after OpenSM is started.

SharpD supports an options file. Options file currently supports
the following options:
```
Usage: sharpd [OPTION]:

OPTIONS:
  -O, --config_file <value>:
        Configuration file
        default value: /etc/sharp/sharpd.cfg

  --log_file <value>:
        Log file
        default value: /var/log/sharpd.log

  --log_level <value>:
        sharpd log verbosity mask:
        0x1 - Default
        0x2 - Verbose
        0x4 - Control
        0x8 - Debug
        0xff - All
        default value: 1

  --log_flush <value>:
        Determines whether to flush log on every message
        default value: TRUE

  --accum_log_file <value>:
        Whether to accumulate or erase log file
        default value: FALSE

  -p, --pid_file <value>:
        Prevents running multiple  Sharp daemons simultaneously
        default value: /var/run/sharpd.pid

  --ib_dev <value>:
        IB device and port (device:port) to which sharp daemon binds to
        default value: (null)

  -P, --daemon_mode:
        Determines whether to run as a daemon_mode or a standard process

  -h, --help:
        Show usage and exit

  -v, --version:
        Print sharpd version and exit
```


SharpD currently works with one port. There is SHARP_IB_DEV environment
variable to specify IB device name and port as follows:
```
<IB device name>:<port number>
```
For example, mlx5_0:2 would be mlx5_0 device port number 2.
If SHARP_IB_DEV environment variable is not specified, the first
active port is used.

When libsharp API is used for creating and acquiring jobs,
AM simulator has to be invoked before running SharpD in the
following manner:

```
SMX_SOCK_PORT=6127 $prefix/bin/sharp_smx_test \
			-n am -m regular -d \
			-i $top_source_dir/examples/sharp_smx_msgs.txt \
			-r <output file> &
$prefix/bin/sharpd

* sharpd should be run with root permissions
* sharp_smx_test doesn't need special permissions
```

When running SMX services (SD / AM) in TCP sockets mode, the network
interface is chosen automatically by using the first one found in UP
state. Specific interface can be specified via:
```
SMX_SOCK_INTERFACE=<INTERFACE> (eth0 / ib0 / ib1)
```


Logging
-------------------------------------------------------------------------------

Following logs are useful for SHArP troubleshooting:

* AM log. Default location is /var/log/sharp_am.log . Following parameters manage AM's log:

  * `log_file`
  * `log_verbosity . Possible values: 1 - Errors; 2 - Warnings; 3 - Info;
    4 - Debug; 5 - Trace.`
  * `verbose`
  * `log_max_backup_files`
  * `log_file_max_size`

* SD log. Default location /var/log/sharpd.log. Following parameters controls log creation:

   * `log_file`
   * `log_flush`
   * `log_level - bit mask for log messages to display default - 1 (default value)
     verbose - 2, control - 4, debug - 8`
   * `accum_log_file - whether to accumulate or erase log file (default 0 - erase)`

* SHARP_COLL logging.

   * `SHARP_COLL_LOG_LEVEL` - Messages with a level higher or equal to the selected will be printed.
     Possible values are: 0 - fatal, 1 - error, 2 - warn, 3 - info, 4 - debug, 5 - trace.

* SM log.

* SMX doesn't have own logging system. It reports messages into application log (AM or SD).

* ibis has own log. Log location and size are configured using following parameters:

  * `ibis_log_file`
  * `ibis_log_size`

Switch IB2 capabilities
-------------------------------------------------------------------------------

* Maximum node radix: 64.
* Maximum number of trees: 64. Tree number 63 is reserved.
* Maximum number of groups: 128.
* Data buffer size: 192K.
* Maximum number of QPs: 2K.
* Maximum operation size: 256 Bytes.
* Minimal MTU: 512 Bytes.
* Outstanding operations: Up 384. Actual number of OST is limited by buffer size.


Limitations
-------------------------------------------------------------------------------

### v1.1.0

* Mellanox OS is not supported. SM has to run on dedicated server.
* AM doesn't support topology updates. Adding/removing switches, switch reboot, adding new hosts
  requires lst file update and AM restart. Compute host reboot and shutdown don't need AM restart.
* Only one job allowed per host.
* Only one tree can be allocated for a job.
* Only one IB port can be used in a host.
* AM doesn't support handover/failover. Each new instance of AM cleans SHArP resources
  in all discovered ANs.
* Only quasi fat-tree topology is supported.
* Only homogeneous fabric are supported. (All switches must be SHArP compatible).
* Non homogeneous fabric (not all switches must be SHArP compatible) needs manual configuration for
  trees and connections between hosts and ANs. See [TreesConfigurationFile.md](doc/TreesConfigurationFile.md).
* Partial software update is not supported. Each software update requires update all components
  including AM, SD and SMX.
* AM Key 0 only tested.


Known issues
-------------------------------------------------------------------------------

### v1.1.0

* SD and AM can be run on the same server.
* Custom signal handler in SD can stuck during backtrace discovery. As result MPI
  job could wait for SD response forever.
* HCOLL asks for surplus group channels in some cases. Number of asked group channels
  is a function of maximum socket id. If a socket with lower id is not used, HCOLL
  in any case asks sharp resources for it.


Contributing to the project
-------------------------------------------------------------------------------

See [CONTRIBUTING.md](.github/CONTRIBUTING.md)

References
------------------------------------------------------------------------------
[Multi-channel group]: https://github.com/Mellanox/sharp/wiki/Multi-channel-group
