

             FCA v2.2 Quick Start Guide ($Rev$):
             ==========================================

    Overview:

- For fresh FCA v2.2 installation, please refer to chapters 1-5
- For Upgrade from FCA please refer to Appendix D (chapter 8)
- For running examples and general information please refer to chapters 5-7.



1.  Pre-requisites:

    - Mellanox IB QDR/FDR Switches, Voltaire IB QDR switches with software version v3.5 or later.
    - Mellanox CX2/CX3 HCA with firmware version 2.7.8200 or later. 
      The HCA firmware can be downloaded from:
	  http://www.mellanox.com/content/pages.php?pg=firmware_download
    - OpenMPI v1.5.5 or later from http://www.openmpi.org/ 
	- Mellanox OFED 1.5.3-3.0.0 or later

2.  Install FCA Manager on dedicated node

    a.  The FCA manager can be installed from RPM (Option A) or Tarball file
    (Option B). Please select one of the installation options according to
    your site installation policy.

        - Option A: Install FCA Manager from RPM
            
            * Note: Select this option if you prefer to install FCA Manager on
            * the machine`s local disk and let RPM package handle all
            * post-install tasks
            ------------------------------------------------------------------

            # rpm -ihv fca-2.2.3852-1.x86_64.rpm

            * Note: Set environment variable pointing to extracted 
              location of FCA in your login profile
            ------------------------------------------------------

            # export FCA_MGR_HOME=/opt/mellanox/fca

            * Note: Optional step: Configure FCA Manager to start 
              automatically after boot
            -----------------------------------------------------

            # /etc/init.d/fca_managerd install_service

        - Option B: Install FCA Manager from Tarball

            * Note: Select this option if you wish to install FCA Manager in
            * any location (user`s home directory, NFS shared folder, etc.)
            * There are number of post-install tasks that need to be applied
            * as root after tarball installation

            # mkdir -p /usr/local/mellanox
            # cd /usr/local/mellanox
            # tar zxvf fca-2.2.3852-1.x86_64.tar.gz

            * Note: Please run post-install script
            --------------------------------------

            # cd fca-2.2.3852-1.x86_64
            #./scripts/udev-update.sh

            * Note: Set environment variable pointing to extracted 
              location of FCA in your login profile
            ------------------------------------------------------

            # export FCA_MGR_HOME=/usr/local/mellanox/fca-2.2.3582-1.x86_64

    d.  Start FCA Manager

        # $FCA_MGR_HOME/scripts/fca_managerd start


3.  Install FCA MPI support libraries

    a. Install FCA package from RPM or Tarball, select only one option from
    provided below.

        - Option A: Install FCA on all cluster nodes from RPM (as root):

            # rpm -ihv fca-2.2.3852-1.x86_64.rpm        

            * Note: Set environment variable pointing to extracted 
              location of FCA in users`s login profile
            ------------------------------------------------------

            # export FCA_HOME=/opt/mellanox/fca

        - Option B: Install FCA from Tarball in the shared NFS location.

            $ mkdir -p $HOME/mellanox
            $ cd $HOME/mellanox
            $ tar zxvf fca-2.2.3852-1.x86_64.tar.gz

            * Note: Set environment variable pointing to extracted 
              location of FCA to the user`s login profile
            ------------------------------------------------------

            # export FCA_HOME=$HOME/mellanox/fca/fca-2.2.3852-1.x86_64

            * Note: Please run following post-install script as root on all
              cluster nodes
            ---------------------------------------------------------------

            # $FCA_HOME/scripts/udev-update.sh


    b. Build OpenMPI 1.5.x with FCA support

        * Note: Download OpenMPI 1.5.x from OpenMPI site
        ------------------------------------------------

        $ mkdir -p $HOME/openmpi
        $ cd $HOME/openmpi
        $ wget http://www.open-mpi.org/software/ompi/v1.5/downloads/openmpi-1.5.5.tar.gz
        $ tar zxvf openmpi-1.5.5.tar.gz
        $ cd openmpi-1.5.5
        $ ./autogen.sh
        $ mkdir -p build-fca
        $ cd build-fca
        $ ../configure --prefix=$PWD/install --with-fca=$FCA_HOME --with-openib
        $ make
        $ make install
        $ export MPI_HOME=$HOME/openmpi/openmpi-1.5.5/build-fca/install      

    c. Check OpenMPI with FCA installation

        * Note: The list of FCA parameters should be displayed 
          as a command output
        ------------------------------------------------------

        $ $MPI_HOME/bin/ompi_info --param coll fca | grep fca_enable


4.  Running MPI jobs with FCA

    a. Please check script examples with HOWTO run MPI jobs with FCA for
    different MPI vendors:

    * For OpenMPI MPI:   $FCA_HOME/scripts/run-ompi-fca.sh
    * For Platforms MPI: $FCA_HOME/scripts/run-pmpi-fca.sh
    * For Intel MPI:     $FCA_HOME/scripts/run-impi-fca.sh
    * For MVAPICH2 MPI:  $FCA_HOME/scripts/run-mvapich2-fca.sh

    b. Check $FCA_HOME/etc/fca_mpi_spec.ini file for various FCA tuning
    options.


6.  Appendix A: Configuration parameters of FCA Manager
    
    The FCA Manager process reads its configuration on startup from file 
    $FCA_MGR_HOME/etc/fca_manager_spec.ini

    The FCA Manager configuration file is in INI format and contains two sections:
    "fmm" and "ib".

    6.1 Configuration parameters for "fmm" section

    Here is a list of available configuration options for "fmm" section:

    a. osm_type = <string> 

    Description:    Select subnet manager service provider. 
    Valid values:   <opensm|ufm|autodetect>
                    ufm         - Use UFM
                    opensm      - Use OpenSM library
                    autodetect  - Detect automatically from fabric

    Default:        opensm
    Example:        osm_type = opensm
    Limitations:    When embedded OpenSM is used in the switch, the fca module
                    should be disabled in that specific switch.

    b. optimize_torus = <int>

    Description:    Enable/Disable optimizations for torus topology.
    The torus optimizations are available only with "osm_type=opensm"
        

    Valid values:   <0|1>
                    0 - dsable
                    1 - enable

    Example:        optimize_torus = 1


    c. ufm_url = <string>

    Description:    URL for UFM`s OpenSM service.
    Valid values:   http://host:port/
    Default:        http://localhost:8081/
    Example:        ufm_url = http://foo:8081/


    d. debug_level = <int>

    Description:    Verbosity level of FCA Manager for debugging. The valid debug
    levels are: 0 (fatal), 1 (error), 2 (warn), 3 (info), 4 (debug), 5,6,7 (a lot 
    of detailed debug info)

    Valid values:   0..7
    Default:        3
    Example:        debug_level = 3

    e. log_file = <string>

    Description: FCA Manager log filename. The logfile name can contain
    printf like tokens, which will be substituted during log file creation:

    %H - hostname where FCA Manager is running
    %D - current date in format DDMMYYYY
    %T - current thread ID

    Valid values:   Any string representing log file name
    Default:        fmm_%H_%D.log
    Example:        log_file = fmm_%H_%D.log

    f. log_file_max_size = <int>

    Description:    This is a critical size in MB of the file above which the 
    file will be rolled. If set to zero - rolling is disabled and file size is
    unlimited.

    Valid values:   any positive valid number
    Default:        10
    Example:        log_file_max_size = 10

    g. log_file_max_backup_files = <int>

    Description:    This property denotes the number of backup files to be
    created. Effective only when log_file_max_size parameter has value
    greater then zero.

    Valid values:   any positive valid number
    Default:        20
    Example:        log_file_max_backup_size = 20

    h. enable_stdout = <char>

    Description:    Print logging information to stdout.
    Valid values:   <y|n>
    Default:        y
    Example:        enable_stdout = n

    i. strategy =   <char>

	Description:    Tree building strategy. Build SW tree or use icpu,
	if applicable.

    Valid values:   <software|simple>
	Default:        software
	Example:        strategy = software

    j. max_comms = <int>

    Description:    This is the maximal amount of concurrent comms supported by the FMM.

    Valid values:   any positive valid number up to 4096
    Default:        4096
    Example:        max_comms = 2000

    6.2 Configuration parameters for "ib" section

    a. dev_name = <string>

    Description:    If set, the specified IB device will be used for communication.
    The name as appears in /sys/class/infiniband/ directory.
    If not set, the first device with ACTIVE port will be used
    Valid Values:   String representing active IB device name
    Default:        Leave empty or commented, then auto-discovery will be used.
    Example:        dev_name = mlx4_0

    b. port_num = <int>

    Description:    If set, the selected port number will be used on the matching device.
    If not set or zero, the first active port will be used
    Valid values:   Positive integer
    Default:        unset (use auto-discover)
    Example:        port_num = 2

    c. service_level = <int>

    Description:    Quality of Service (QoS) is ooffered in IB as a means to
    offer some guarantees/minimum requirements for certain applications on the
    fabric. SL2VL mapping should be configured in OpenSM.
    Valid values:   0-15
    Default:        0
    Example:        service_level=0


    

6.  Appendix B: Configuration parameters of FCA MPI Runtime Library

    The FCA runtime library is used by MPI to offload collective operations
    into IB switches. The FCA runtime library can be provided with external
    configuration parameters. These configuration parameters consulted by FCA 
    and loaded from file or from shell environment.

    * The configuration file for FCA MPI runtime libraries can be found at 
    $FCA_HOME/etc/fca_mpi_spec.ini 

    * The FCA parameters can be also provided as a command line parameters
    to the MPI job. Use following convention to pass FCA parameter from shell
    environment:

        export fca_<ini_section_name>_<ini_section_param_name>=value

        Example:
        
        export fca_mpi_debug_level=5

        or provide it to OpenMPI as command line argument:

        mpirun -x fca_mpi_debug_level=5 <... other mpirun parameters ...>
    
    * The FCA runtime configuration file is in INI format and contains two sections:
    "mpi" and "ib".

    7.1 Configuration parameters for "mpi" section

    a. debug_level = <int>

    Description:    Verbosity level of FCA for debugging. The valid debug
    levels are: 0 (fatal), 1 (error), 2 (warn), 3 (info), 4 (debug), 5,6,7 (a lot 
    of detailed debug info)

    Valid values:   0..7
    Default:        2
    Example:        debug_level = 3

    b. log_file = <string>

    Description: FCA log filename. The logfile name can contain
    printf like tokens, which will be substituted during log file creation:

    %H - hostname of the process
    %u - current time in ms.
    %T - current thread ID
    %s - time in sec.
    %t - time in ticks.

    Valid values:   Any string representing log file name or empty for none.
    Default:        empty (no log file)
    Example:        log_file = fca_mpi_%h_%u.log

    c. enable_stdout = <char>

    Description:    Print logging information to stdout.
    Valid values:   <y|n>
    Default:        y
    Example:        enable_stdout = n

    d. fp_sum_fixedpoint = <char>

    Description:    Use fixed-point math when doing floating point summation, to keep
    a consistent result regardless of operation ordering.

    Valid values:   <y|n>
    Default:        n
    Example:        fp_sum_fixedpoint = n

    e. fca_mpi_fp_sum_forceorder = <char>

    Description:    Force strict order of summation in collective operations, to keep
    a consistent result regardless of operation ordering.

    Valid values:   <y|n>
    Default:        n
    Example:        fca_mpi_fp_sum_forceorder = n

    f. collect_stats = <char>

    Description:    Enable/Disable collecting statistics.
    Valid values:   <y|n>
    Default:        n
    Example:        collect_stats = y


    g. stats_max_ops = <int>

    Description:    Number of collective operations to save statistics for. This
    option is effective when "collect_stat = y".
    Valid values:   Positive integer
    Default:        1000
    Example:        stats_max_ops = 1000

    h. stats_file_name = <string>

    Description:    Filename to save statistics info, rank process ID will be appended.
    Valid values:   Filename or empty (auto-generate)
    Default:        fca_stats.xml
    Example:        stats_file_name = fca_stats.xml

    i. slow_num_polls = <int>

    Description:	Number of shared memory polls before going slow.
    Valid values:	Positive integer
    Default:		100000
    Example:		slow_num_polls = 100000

    j. slow_sleep = <int>

    Description:    Number of microseconds to sleep before calling MPI
					callback.
	Valid values:   Positive integer
	Default:        100
	Example:        slow_sleep = 100

    7.2 Configuration parameters for "ib" section

    a. dev_name = <string>

    Description:    If set, the specified IB device will be used for communication.
    The name as appears in /sys/class/infiniband/ directory.
    If not set, the first device with ACTIVE port will be used
    Valid Values:   String representing active IB device name
    Default:        Leave empty or commented, then auto-discovery will be used.
    Example:        dev_name = mlx4_0

    b. port_num = <int>

    Description:    If set, the selected port number will be used on the matching device.
    If not set or zero, the first active port will be used
    Valid values:   Positive integer
    Default:        unset (use auto-discover)
    Example:        port_num = 2


7. Appendix D: Configuring rules for offloading
    
    The FCA system can be provided with user-defined rules to select best
    offloading method for MPI communication and tune other options. 
    The used-defined rules can consider following MPI Communicator parameters:

		- message size range (in bytes)
		- communicator size (in ranks)
                - Data type and reduce op (for all/reduce collective operations)

    The rule which first fits the parameters above can set the following options:

		- offloading method (CD - CoreDirect, UD, MPI native)
                - force order (Whether to force consistent order in collective operations) 

    The fca_mpi_spec.ini file should contain section "rules" where user
    can enable/disable use of dynamic rules system.


    8.1 Configuration parameters for "rules" section

    a. enable = <0|1>

    Description:    Enable/disable dynamic rules mechanism.
    Valid values:   <0|1>
                    0 - Disable
                    1 - Enable
    Default:        0 - Disable
    Example:        enable = 1

    8.2 Configuring a specific rule

	User-defined offloading rules should be added and enumerated in the
	fca_mpi_spec.ini file. Every user-defined rule is represented by new
	INI file section, named in the following format: 
	
		[rule-<coll_name>-<SN>]

    - coll_name can be one of the following values:

    * reduce
    * allreduce
    * bcast
    * barrier
    * allgather
    * allgatherv

	- SN is a rule serial number for given coll_name
    - Default value for min/max params is -1, which means "no limit".

    - Valid offload types are:
    * ud           - use FCA in UD mode
    * cd (default) - use FCA in Core Direct mode
    * none         - don't use FCA

    - Rules are applied by the first match.
    If none of the rules match, the default is to use FCA with Core Direct mode.

    8.3 Valid rule`s parameters:

    a. msg_size_min = <int>

    Description:    Minimum message size.
    Valid values:   -1 - No limit.
                    Positive integer (not greater than the maximum message size).
    Default:        -1 - No limit
    Example:        msg_size_min = 1024

    b. msg_size_max = <int>

    Description:    Maximum message size.
    Valid values:   -1 - No limit.
                    Positive integer (not lesser than the minimum message size).
    Default:        -1 - No limit
    Example:        msg_size_min = 2048

    c. comm_size_min = <int>

    Description:    Minimum communicator size
    Valid values:   -1 - No limit.
                    Positive integer (not greater than the maximum communicator size).
    Default:        -1 - No limit
    Example:        comm_size_min = 20

    d. comm_size_max = <int>

    Description:    Maximum communicator size
    Valid values:   -1 - No limit.
                    Positive integer (not lesser than the minimum communicator size).
    Default:        -1 - No limit
    Example:        comm_size_max = 40

    d. offload_type = <string>

    Description:   FCA offload type 
    Valid values:
                    ud           - use FCA in UD mode
                    cd (default) - use FCA in Core Direct mode
                    none         - don't use FCA, fall back to MPI
    Default:        cd (Core Direct mode)
    Example:        offload_type = ud

    e. data_type = <string>

    Description:   Data type given as parameter
    Valid values:
                    All MPI types (e.g. MPI_CHAR)
    Example:        data_type = MPI_CHAR

    f. reduce_op = <string>

    Description:   Reduce operation type requested
    Valid values:
                    All MPI reduce operation (e.g. MPI_BXOR)
    Example:        data_type = MPI_BXOR

    g. force_order = <int>

    Description:    Wheather to force a consistent order in all/reduce collective operations (for floating point accuracy).
    Valid values:   0 - Do not force order.
                    Positive integer - force order.
    Default:        0 - Do not force order.
    Example:        force_order = 1

    ## Sample reduce rules

    [rules]
    enable         = 1

    [rule-reduce-1]
    msg_size_min   = 256
    msg_size_max   = 1024
    comm_size_min  = 30
    comm_size_max  = 35
    offload_type   = ud
    data_type      = MPI_CHAR
    reduce_op      = MPI_LXOR

    [rule-reduce-2]
    msg_size_min   = 1
    msg_size_max   = 2024
    comm_size_min  = 1
    comm_size_max  = 10
    offload_type   = none

8.  Appendix C: OpenMPI MCA parameters to control FCA offload

    * The complete list of OpenMPI FCA related parameters can be extracted with ompi_info 
    command, for example: 

        $MPI_HOME/bin/ompi_info --param coll fca

    * The MCA parameters should be provided to OpenMPI mpirun command in the following format:

        $MPI_HOME/bin/mpirun -mca <param> <value>

        for example:

        $MPI_HOME/bin/mpirun -mca coll_fca_verbose 1 <... other mpirun args ...>

    Here is a list of MCA parameters for FCA:

    a. coll_fca_priority = <int>

    Description:        Priority of the fca coll component
    Default:            80

    b. coll_fca_verbose = <int>

    Description:        Verbose level of the fca coll component
    Default:            0

    c. coll_fca_enable = <0|1>

    Description:        Enable/Disable Fabric Collective Accelerator
    Default:            1

    d. coll_fca_spec_file = <string>

    Description:        Path to the FCA configuration file fca_mpi_spec.ini
    Default:            $FCA_HOME/etc/fca_mpi_spec.ini

    e. coll_fca_library_path = <string>
                          
    Description:        Path to FCA runtime library
    Default:            $FCA_HOME/lib/libfca.so

    f. coll_fca_np = <int>

    Description:        Minimal allowed job's NP to activate FCA
    Default:            64

    g. coll_fca_enable_barrier = <0|1>

    Description:        Enable/Disable FCA Barrier support
    Default:            1

    h. coll_fca_enable_bcast = <0|1>

    Description:        Enable/Disable FCA Bcast support
    Default:            1

    g. coll_fca_enable_reduce = <0|1>

    Description:        Enable/Disable FCA Reduce support
    Default:            1

    h. coll_fca_enable_allreduce = <0|1>

    Description:        Enable/Disable FCA Allreduce support
    Default:            1

    i. coll_fca_enable_allgather = <0|1>

    Description:        Enable/Disable FCA Allgather support
    Default:            1

    j. coll_fca_enable_allgatherv = <0|1>

    Description:        Enable/Disable FCA Allgatherv support
    Default:            1


9.   Appendix D: Upgrading from older FCA version 

    To upgrade from older FCA the following steps should be applied:

        * Upgrade FCA module in all IB QDR switches
        * Upgrade FCA Manager
        * Upgrade FCA MPI runtime support library on all cluster nodes

    a. Apply commands from chapter (4) to upgrade FCA version in the switches.

    b. Save FCA Manager configuration files to the different location from 
    where FCA was installed

        cp $FCA_MGR_HOME/etc/*.ini $HOME/

    c. Save FCA runtime library configuration files to the different location 
    from where FCA runtime was installed

        cp $FCA_HOME/etc/*.ini $HOME/

    d. Stop running FCA Manager
        
        $FCA_MGR_HOME/scripts/fca_managerd.sh stop

    e. Uninstall FCA Manager

        rpm -e fca

        * Note: if FCA Manager was installed from tarball, stop 
        * the FCA Manager: 
        ---------------------------------------------------------------

    f. Uninstall FCA MPI runtime support library from all cluster nodes

        rpm -e fca

    g. Follow steps 1-3 to install new version of FCA Manager and MPI runtime
    library.


That`s all folks!
