From b091a17b1ec7a5b546c2450bbd24bd26716c2f67 Mon Sep 17 00:00:00 2001 From: Honggang Li Date: Sun, 4 Aug 2019 21:26:04 -0400 Subject: [PATCH] Fix segment fault issue for linux container While run openmpi/mpirun with linux containers, the libfabric failed with segment fault message. Signal: Segmentation fault (11) Signal code: Address not mapped (1) Failing at address: 0xfffffffffffffff0 [ 0] /lib64/libpthread.so.0(+0x12d80)[0x14feb5d4dd80] [ 1] /lib64/libfabric.so.1(+0x23cd1)[0x14fea8105cd1] [ 2] /lib64/libfabric.so.1(+0x18240)[0x14fea80fa240] [ 3] /lib64/libfabric.so.1(fi_getinfo+0x695)[0x14fea80faea5] [ 4] /lib64/libfabric.so.1(fi_getinfo+0x4e)[0x14fea80ffe9e] [ 5] /usr/lib64/openmpi/lib/openmpi/mca_btl_usnic.so(+0xdf4e)[0x14fea8445f4e] [ 6] /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_btl_base_select+0xed)[0x14feb547815d] [ 7] /usr/lib64/openmpi/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x16)[0x14fea9fab2f6] [ 8] /usr/lib64/openmpi/lib/libmpi.so.40(mca_bml_base_init+0xa4)[0x14feb5ffef94] [ 9] /usr/lib64/openmpi/lib/libmpi.so.40(ompi_mpi_init+0x654)[0x14feb5fac474] [10] /usr/lib64/openmpi/lib/libmpi.so.40(MPI_Init+0x72)[0x14feb5fdc6b2] [11] /home/mpi/ring[0x4009ad] [12] /lib64/libc.so.6(__libc_start_main+0xf3)[0x14feb599a813] [13] /home/mpi/ring[0x4008be] The 'scandir' function called by 'ofi_mem_init' returned -1 with errno set to ENOENT. Fixes: 8ce14923ba67 (core/mem: Obtain a list of available huge pages in system) Signed-off-by: Honggang Li --- src/mem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mem.c b/src/mem.c index 91836a79c..23617a0a4 100644 --- a/src/mem.c +++ b/src/mem.c @@ -84,7 +84,7 @@ void ofi_mem_init(void) num_page_sizes = 1; } - while (n--) { + while (n-- > 0) { if (sscanf(pglist[n]->d_name, "hugepages-%zukB", &hpsize) == 1) { hpsize *= 1024; if (hpsize != page_sizes[OFI_DEF_HUGEPAGE_SIZE]) -- 2.20.1