Testing

We use pytest for the test module. The dependency has already been included in the requirements.txt file and should be installed automatically with aweSOM.

Functionality tests

Run tests for all modules in the root directory of the repository:

python -m pytest

You can also run specific test modules by specifying the path to the test file:

python -m pytest tests/[module]_test.py

Or run a specific test function within a module:

python -m pytest tests/[module]_test.py::test_[function]

If there is no GPU, or if the GPU is not CUDA-compatible, the sce_test.py module will fail partially. This is expected behavior, and SCE computation should still fall back to the CPU.

Performance tests

aweSOM includes many additional features compared to the original implementation of POPSOM and ensemble learning. Therefore, it is not possible to directly compare the performance of aweSOM with these legacy packages. However, we can still make a rough comparison by mimicking the original implementation of ensemble learning and POPSOM in aweSOM, and then benchmarking their performance.

First install the dependencies for these legacy packages:

pip install -r tests/performance/requirements.txt

Then run the performance tests inside the aweSOM/tests/performance/ directory.

Benchmarking aweSOM against POPSOM

python popsom_bench.py --N 10000 --F 4

where N is the number of points and F is the number of features. The script will train a POPSOM map and an aweSOM map given the same mock dataset, and compare the training time of the two algorithms.

Additionally, high-level controls include: --nodes to specify the number of nodes in the lattice, which might be useful for isolated scaling tests; --procedure [training, mapping, both] to specify which part of the algorithm to benchmark; and --popsom or --awesom to specify one of the two algorithms to benchmark separately.

If you are running a long-duration test that requires dedicated node(s), you can refer to examples/slurm_scripts/submit_popsom_bench.cpu for an example SLURM script to run this benchmark.

For a personal computer, we recommend using a smaller number of points (\(N \sim 10^4\)) and features (\(F < 5\)) for the test to complete in a reasonable amount of time. More extensive tests can be run on a high-performance computing cluster. For example, one modern compute node with 40+ cores can perform this benchmark up to \(10^6\) points and \(\sim 20\) features. POPSOM generally cannot handle more than \(10^6\) points, since training time can exceeds 2 hours at these parameters and/or an out-of-memory error will be raised (even with 1 TB of memory per node).

The expected performance of aweSOM is a speedup of \(\sim 8-20 \times\) compared to POPSOM, depending on the number of points and features.

Benchmarking aweSOM against ensemble learning

python sce_bench.py --N 100000 --R 20

where N is the number of points and R is the number of independent realizations. The script will generate mock cluster IDs for the dataset and save them as npy files. Then it will perform the SCE analysis on the dataset using both the aweSOM and legacy implementations, and compare the training time of the two algorithms.

Additionally, high-level controls include: --C to specify the number of clusters per realization; --legacy or --awesom to specify one of the two algorithms to benchmark separately.

NOTE: If the test did not complete successfully, there will be a directory named som_out in the current working directory. This should be cleaned up manually.

If you are running a long-duration test that requires dedicated node(s), you can refer to examples/slurm_scripts/submit_sce_bench.cpu and examples/slurm_scripts/submit_sce_bench.gpu for example SLURM scripts to run this benchmark.

In general, the Numpy version of aweSOM is around \(2 \times\) faster than the legacy implementation. However, the GPU version of aweSOM is slower than the legacy implementation due to the overhead for small datasets (\(N < 5\times10^4\)). The GPU version of aweSOM is only faster for large datasets (\(N > 10^5\)), and is exponentially faster as you scale up beyond \(N \sim 10^6\).

We tested the performance of the SCE implementation on a single NVIDIA V-100 GPU with 32 GB of memory. At \(N = 10^6\) and \(R = 10\), aweSOM is faster than the legacy implementation by a factor of \(\sim 15\). At \(N = 10^7\) and \(R = 10\), aweSOM is faster by a factor of \(\sim 60\). In high-resolution simulations where \(L^3 \gtrsim 500, N = 10^8\), aweSOM is the only feasible option for performing the SCE analysis.