------------------ Released version 2.6.2 ----------------------------

 * Add support for
    - Score-P v9.0
    - OTF2 v3.1
    - Cube C Writer v4.9

 * Build system improvements:
    - Improved support for Cray EX systems:
       - Support Cray compilers via `--with-nocross-compiler-suite`.
       - By default, use the `cc`, `CC`, `ftn` compiler wrappers
         instead of MPI compiler wrappers.
    - Silenced most classic Intel compiler deprecation warnings.

 * Automatic trace analyzer changes & improvements:
    - Fixed callstack handling of OpenMP tasks for hybrid MPI+OpenMP
      programs.
    - Fixed various issues with OpenMP measurements via OMPT:
       - Handling of `teams` constructs executed on the host.
       - Handling of lock events in `ordered` constructs.
       - Handling of implementation barriers.
    - Fixed an issue in report writing that caused program regions
      to be missing in score reports of analysis results files.

 * Analysis report postprocessing changes:
    - Fixed classification of various MPI functions (mostly introduced
      with MPI 4.0).
    - Add metric hierarchies for HIP and OpenMP target offloading.
      (NOTE: The trace analysis still only supports host-side events!)


------------------ Released version 2.6.1 ----------------------------

 * Support for
    - Score-P v8.0, incl. OpenMP measurements via OMPT
    - OTF2 v3.0
    - Cube C Writer v4.8

 * Build system improvements:
    - Add proper support for additional compiler suites via
      `--with-nocross-compiler-suite=<suite>`:
       - `amdclang`: AMD ROCm compilers (amdclang, amdclang++)
       - `nvhpc`: NVIDIA HPC Toolkit compilers (nvc, nvc++)
       - `oneapi`: Intel oneAPI compilers (icx, icpx)

 * Automatic trace analyzer changes & improvements:
    - Fixed a performance regression when analyzing traces including
      OpenMP with few instrumented user regions.
    - Improved metric data aggregation and writing.

 * Measurement nexus (scan) changes:
    - Removed stalled resources counter from default POP multi-run
      preset and added an enhanced preset `pop-with-stalled-resources`.


------------------- Released version 2.6 -----------------------------

 * Build system improvements:
    - Auto-detect Cray XC platforms with ARM CPUs, supporting Cray,
      ARM, and GCC compilers
    - Added support for Clang and AMD AOCC compilers
    - Updated support for Spectrum MPI

 * Automatic trace analyzer changes & improvements:
    - Revised "Early Reduce" wait state definition.
    - Added calculation of "Early Reduce" delay costs.
    - Fixed various delay cost calculation and propagation issues.
    - Fixed various inconsistencies between wait-state and root-cause
      analysis.
    - Made POSIX threads analysis consistent with Score-P by avoiding
      thread function stub call paths underneath 'pthread_create'.
      This also fixes a deadlock when analyzing traces containing
      "orphaned threads".

 * Measurement nexus (scan) changes:
    - Added preset mode for multi-run measurements with a preset for
      POP analysis requirements as an use case.
    - Added support for multiple file systems in SCAN_TRACE_FILESYS
      by using a colon separated list of paths.

 * Analysis report postprocessing changes:
    - Add metric hierarchies for CUDA, OpenCL, and OpenACC.
      (NOTE: The trace analysis still only supports host-side events!)
    - Renamed '-c' command-line option of 'square' to '-C' for running
      sanity checks on newly created reports.
    - Added new '-c' command-line option to 'square' to allow specifying
      the number of counters considered during report scoring (for
      consistency with 'scorep-score').
    - Added new '-x' command-line option to 'square' to allow passing
      options directly through to 'scorep-score'.
    - Avoid unnecessary aggregation/postprocessing of reports with
      multi-run experiments.

 * Substantial code cleanup.


------------------- Released version 2.5 -----------------------------

 * Support for
    - Score-P v5.0, incl. virtual process/thread topologies

 * Automatic trace analyzer changes & improvements:
    - Various fixes and improvements in timestamp correction algorithm.
    - Fixed 'Late Receiver' instance tracking.
    - Slightly improved analysis report data collation.

 * Added support for multi-run experiments.

 * Code refactoring and various bug fixes.

 * Improved user documentation:
    - Revised User Guide including command reference.
    - Added man pages.


------------------- Released version 2.4 -----------------------------

 * Support for
    - Cube v4.4

 * Build system improvements:
    - Fix build issues with compilers defaulting to C++11 or higher
      (e.g., Intel 2017, PGI 17).
    - Fix build issues with PGI 16+ compilers (pgCC no longer available)
    - Fix build issues on Cray systems, now also properly taking
      CRAYPE_LINK_TYPE setting into account

 * Automatic trace analyzer changes & improvements:
    - Fix rare crash/deadlock in critical-path/delay analysis while
      analyzing MPI persistent communication.
    - Improved memory management.
    - Improved handling of OTF2 traces in SIONlib containers.
    - Improved trace reading times, especially at scale.
    - Fixed detection of wait states in active-target synchronization
      based on EPIK traces

 * Code refactoring and various bug fixes.


------------------ Released version 2.3.1 ----------------------------

 * Build system improvements:
    - Fixed build issue with GCC 6.1.
    - Fixed build issue on the Intel Xeon Phi platform.


------------------- Released version 2.3 -----------------------------

 * Support for
    - Score-P v2.0
    - OTF2 v2.0

 * Automatic trace analyzer changes & improvements:
    - Experimental support for Score-P traces collected using
      sampling (see OPEN_ISSUES for limitations).

 * Improved analysis report postprocessing:
    - Revised metric hierarchies (organization, metric naming, etc).
    - Suppress calculation of performance properties that are
      only relevant for unused parallel programming models.

 * Performance property documentation fixes & improvements.

 * Build system improvements.

 * Code refactoring and various bug fixes.


------------------- Released version 2.2.2 ---------------------------

 * Platform support:
    - Fixed a build issue on the Intel Xeon Phi platform.
    - Improved support for the 'ibrun' launcher.

 * Automatic trace analyzer changes & improvements:
    - Worked around rare run-time issue with MVAPICH2.


------------------- Released version 2.2.1 ---------------------------

 * Platform support:
    - Added build system support for Power8/Linux.
    - Added build system support for 64-bit ARM/Linux (AArch64).
    - Prefer linking static over dynamic Cube/OTF2 libraries on
      Fujitsu K/FX10/FX100.

 * Automatic trace analyzer changes & improvements:
    - Fixed delay-cost propagation through OpenMP barrier wait states.
    - Various algorithmic optimizations reducing overall analysis
      time for traces of multi-threaded applications:
       ~ Improved memory management.
       ~ Improved trace preprocessing.
       ~ Improved timestamp correction.

 * Code refactoring and various bug fixes.


------------------- Released version 2.2 -----------------------------

 * Support for
    - Score-P v1.4
    - OTF2 v1.5, incl. full SIONlib support (if configured)
    - Cube v4.3

 * Platform support:
    - Added support for Intel Xeon Phi, native mode only.
    - Added support for Fujitsu FX100 (thanks to T. Nakamura,
      Fujitsu Ltd).

 * Automatic trace analyzer changes & improvements:
    - Added basic support for POSIX threads.
    - Added basic support for OpenMP tasking.
    - Added lock contention analysis (OpenMP & POSIX threads).
    - Added root-cause/delay analysis (MPI & OpenMP).
    - New command-line options '--[no-]rootcause'.

 * Code refactoring and various bug fixes.


------------------- Released version 2.1 -----------------------------

 * Support for
    - Score-P v1.3
    - OTF2 v1.4

 * Platform support:
    - Added support for Fujitsu FX10 & K computer.
    - Improved support for Cray systems.

 * Automatic trace analyzer changes & improvements:
    - Added Critical-path analysis.
    - Improved Late Receiver detection.
    - New command-line options '--[no-]critical-path' and '--single-pass'.
    - Fixed crash in data collation when number of OpenMP threads varied
      among MPI processes.

 * Code refactoring and various small bug fixes.

 * Initial version of updated User Guide (still work in progress).


------------------- Released version 2.0 -----------------------------

 * Support for
    - Score-P v1.2
    - OTF2 v1.2
    - Cube v4.2

 * New build system based on GNU autotools.

 * Significant amount of code refactoring.

 * Automatic trace analyzer changes & improvements:
    - Support for arbitrary deep system trees.
    - Improved performance of timestamp correction.
    - Pattern instance tracking and statistics are now enabled by
      default.
    - New command-line options '--verbose', '--[no-]time-correct',
      and '--[no-]statistics'.
    - Limited backward-compatibility support for handling existing
      traces in EPILOG format generated by Scalasca v1.
