Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install tcmalloc #4564

Merged
merged 4 commits into from
Sep 8, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions src/configure
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,8 @@ Configuration options:
--android-incdir=DIR Android include directory
--enable-kenlm enable kenlm runtime library installed in "../tools/kenlm"
via extras/install_kenlm_query_only.sh, [default=no]
--enable-tcmalloc enable tcmalloc library installed in "../tools/gperftools"
via extras/install_tcmalloc.sh, [default=no]

Following environment variables can be used to override the default toolchain.
CXX C++ compiler [default=g++]
Expand Down Expand Up @@ -611,6 +613,7 @@ static_fst=false
static_math=false
android=false
enable_kenlm=false
enable_tcmalloc=false

FSTROOT=`rel2abs ../tools/openfst`
CUBROOT=`rel2abs ../tools/cub`
Expand Down Expand Up @@ -746,6 +749,9 @@ do
--enable-kenlm)
enable_kenlm=true
shift ;;
--enable-tcmalloc)
enable_tcmalloc=true
shift ;;
*) echo "Unknown argument: $1, exiting"; usage; exit 1 ;;
esac
done
Expand Down Expand Up @@ -1287,6 +1293,28 @@ else
appropriate configuration for this platform. Please contact the developers."
fi

if $enable_tcmalloc ; then
if [ -d ../tools/gperftools ]; then
TCMALLOC_ROOT=$(rel2abs ../tools/gperftools)
echo "Checking tcmalloc library in $TCMALLOC_ROOT"
if [ -f $TCMALLOC_ROOT/lib/libtcmalloc.so ]; then
if [ "$(uname)" == "Linux" ]; then
#If profiler is needed, use -ltcmalloc_and_profiler
echo "LDLIBS += -Wl,-rpath=$TCMALLOC_ROOT/lib -ltcmalloc" >> kaldi.mk
else
failure "Now, --enable-tcmalloc option is only supported for Linux"
fi
else
failure "Can't find libtcmalloc.so in ../tools/gperftools,
refer to ../tools/extras/install_tcmalloc.sh for instruction."
fi
else
failure "--enable-tcmalloc switched on, but can't find tcmalloc in
../tools/gperftools, refer to ../tools/extras/install_tcmalloc.sh
for instruction."
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would merge both cases into one. "Can't find tcmalloc in tools/gperftools. Run 'extras/install_tcmalloc.sh' in 'tools/' to install." is good enough for both. install_tcmalloc.sh will print an error if the user tries to do what message says, i.e run ../tools/extras/install_tcmalloc.sh. "Refer for instructions" is a lie, the script installs stuff without giving any :)

If you split lines, unindent the continuations, it will look weird when printed. failure takes any number of arguments and prints them in a single string with a space between, so if a message is concise enough, this will work to split First half and second half into two code lines:

    failure "First half and" \
            "second half"

Just FYI, you may stick $'\n' into a string to break lines, outside double quotes, but w/o space, or else Line 2 will start with a space, e.g.: "Line 1"$'\n'"Line 2" or $'Line 1\nLine 2' do the same thing. You can also do \' for ' inside a $-prefixed string.

fi

# Append the flags set by environment variables last so they can be used
# to override the automatically generated configuration.
echo >> kaldi.mk
Expand Down
88 changes: 88 additions & 0 deletions tools/extras/install_tcmalloc.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
#!/bin/bash
#
# Copyright 2021 Hang Lyu
#
# This script attempts to install tcmalloc and libunwind.
# The tcmalloc provides the more efficient way to malloc so that it can speed
# up the code, especially for the decoding code which contains massive memory
# allocation operations.
#
# At the same time, the tcmalloc also provides some profilers which can help
# the user to analysis the performance of the code. However, there may have
# some deadlock problems when you use the profilers with default build-in glibc.
# So the libunwind is recommanded to be installed. When the deadlock problems
# happen, the user can try to link with the library libunwind. But there may
# still be some crash on x64 platform. You can skip the installation about
# libunwind if you don't need it. (Link with "-lunwind" after your installation
# if you want to include it.)
#
# Depending on different platforms which are used by different users, the users
# also can try differnet malloc libraries such as tbbmalloc and so on. From our
# test, the tcmalloc is most efficient on our platform.


# Make sure we are in the tools/ directory.
if [ $(basename $PWD) == extras ]; then
cd ..
fi

! [ $(basename $PWD) == tools ] && \
echo "You must call this script from the tools/ directory" && exit 1;

# prepare libunwind
echo "****() Installing libunwind"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • libtcmalloc_minimal does not depend on libunwind, see comments below. We can shed this dependency.
  • libunwind comes with most distros and can be installed with apt (libunwind-dev) or yum (libunwid-devel), unless the version 1.5-rc is strictly required. Also, clang comes with its own version; I do not know how compatible they are. I'd left the choice to the user.

if [ ! -e libunwind-1.5-rc1.tar.gz ]; then
echo "Trying to download libunwind via wget"

if ! which wget >&/dev/null; then
echo "This script requires you to first install wget"
echo "You can also just download libunwind-1.5-rc1.tar.gz from"
echo "http://download-mirror.savannah.gnu.org/releases/libunwind/libunwind-1.5-rc1.tar.gz"
exit 1;
fi

wget http://download.savannah.nongnu.org/releases/libunwind/libunwind-1.5-rc1.tar.gz || exit 1;

if [ ! -f libunwind-1.5-rc1.tar.gz ]; then
echo "Download of libunwind failed!"
echo "Aborting script. Please download and install libunwind manually!"
exit 1;
fi
fi

# install libunwind
# The installation directory can be changed. But be careful, set the prefix as
#`pwd` will cause recursive problem.
tar vxzf libunwind-1.5-rc1.tar.gz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tar is extremely noisy with v. Up to you, but the listing of unpacked files is hardly useful to the end user.

(
cd libunwind-1.5-rc1
./configure --prefix=$PWD/install || exit 1;
make || exit 1;
make install || exit 1;
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exit exits the subshell here, i.e. the script continues past the closing ). Consider set -e, then you can get rid of most || exit checks. Otherwise, do this:

Suggested change
./configure --prefix=$PWD/install || exit 1;
make || exit 1;
make install || exit 1;
)
./configure --prefix=$PWD/install &&
make &&
make install
) || exit

All pipelining operators at end of line (&&, ||, |, |&) continue the pipe to next line without a need for the \.
|| exit preserves the non-zero exit code, explicit 1 is unnecessary.


# prepare tcmalloc
if [ -d gperftools ]; then
echo "$0: Assuming gperftools is already installed Please delete the directory"
echo "./gperftools if you need to reinstall."
mv gproftools gproftools_backup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message does not say what is really going on. Also gperftools != gproftools. You create the former below. Consider

Suggested change
echo "$0: Assuming gperftools is already installed Please delete the directory"
echo "./gperftools if you need to reinstall."
mv gproftools gproftools_backup
echo "$0: existing 'gperftools' subdirectory is renamed 'gperftools.bak'"
mv -f gperftools gperftools.bak || exit

mv -f needed in case another gperftools.bak exists.

|| exit handles mv's failure. set -e sometimes works better, but then you need to take care not to return non-zero from any pipeline.

The extension .bak should be already ignored in .gitignore, either at the root level or in tools/. If not, it's a good idea to have added it (as *.bak; this will ignore both matching files and directories). In any case, make sure that all downloaded files and created directories are ignored by Git.

fi
git clone https://github.com/gperftools/gperftools.git gperftools

# install tcmalloc
wd=$PWD
cd gperftools
bash autogen.sh
./configure --prefix=$PWD || exit 1;
make || exit 1;
make install || exit 1;
cd ..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We did not have a dependency on Autoconf/Automake before. Better, get a pre-autoconf'ed distribution tarball from https://github.com/gperftools/gperftools/releases/download/gperftools-2.9.1/gperftools-2.9.1.tar.gz.
  • Use the same subshell pattern to keep PWD?
  • Do we need the whole thing, with the heap profiler and heap checker? I think that for the default user build, the default should be -ltcmalloc_minimal, which skips all the stuff. I think 5 people at most need it. :) This also gets rid of the dependency on libunwind. Users who want and know how to profile are certainly qualified to reconfigure and build as much of the toolset as they want (and can install libunwind with a package manager). If you do this, change the library name in 'configure' too.
Suggested change
git clone https://github.com/gperftools/gperftools.git gperftools
# install tcmalloc
wd=$PWD
cd gperftools
bash autogen.sh
./configure --prefix=$PWD || exit 1;
make || exit 1;
make install || exit 1;
cd ..
wget https://github.com/gperftools/gperftools/releases/download/gperftools-2.9.1/gperftools-2.9.1.tar.gz &&
tar xzf gperftools-2.9.1.tar.gz &&
mv gperftools-2.9.1 gperftools || exit
# install tcmalloc
(
cd gperftools &&
./configure --prefix=$PWD --enable-minimal --disable-debugalloc --disable-static &&
make &&
make install
) || exit


# add path
(
wd=$PWD
echo export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$wd/gperftools/lib
echo export PATH=$PATH:$wd/gperftools/bin
) >> env.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • $LD_LIBRARY_PATH and $PATH will be expanded before writing to env.sh; you probably want to keep the $ literal in env.sh, not capture the current PATH.
  • A subshell is unnecessary; you can redirect output of a block.
  • From "man ld.so" on LD_LIBRARY_PATH: A zero-length directory name indicates the current working directory. This is not what you want if LD_LIBRARY_PATH is not set. An extra check for empty/unset variable is needed not to add a :. :+ in a variable substitution makes it expand to empty string, otherwise to the expasnion of an expression after the :+. PATH is usually set, so it's safe to assume $PATH: does not expand to : (the rule for PATH is the same: an empty string after splitting at every : means current directory).
Suggested change
(
wd=$PWD
echo export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$wd/gperftools/lib
echo export PATH=$PATH:$wd/gperftools/bin
) >> env.sh
{
echo export LD_LIBRARY_PATH=\${LD_LIBRARY_PATH:+\${LD_LIBRARY_PATH}:}$PWD/gperftools/lib
echo export PATH=\$PATH:$PWD/gperftools/bin
} >> env.sh

Also:

  1. gperftools/bin will not be built if we use minimal tcmalloc.
  2. Why do we need LD_LIBRARY_PATH? We're building with -ltcmalloc[_minimal], so the path to the .so is baked into a binary anyway. I think it's needed only for libunwind?


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove empty line