diff --git a/docs/src/advanced.md b/docs/src/advanced.md
index 6c1a28f..0e33001 100644
--- a/docs/src/advanced.md
+++ b/docs/src/advanced.md
@@ -45,7 +45,7 @@ In the setting of missing response, the adjusted MM updates in each interation a
 where ``\boldsymbol{Z}^{(t)}`` is the completed response matrix from conditional mean and ``\boldsymbol{M}_i^{*(t)} = (\boldsymbol{I}_d \otimes \boldsymbol{1}_n)^T [(\boldsymbol{1}_d \boldsymbol{1}_d^T \otimes \boldsymbol{V}_i) \odot (\boldsymbol{\Omega}^{-(t)} \boldsymbol{P}^T \boldsymbol{C}^{(t)}\boldsymbol{P}\boldsymbol{\Omega}^{-(t)})] (\boldsymbol{I}_d \otimes \boldsymbol{1}_n)``, while ``\boldsymbol{R}^{*(t)}`` is the ``n \times d`` matrix such that ``\text{vec}\ \boldsymbol{R}^{*(t)} = \boldsymbol{\Omega}^{-(t)} \text{vec}(\boldsymbol{Z}^{(t)} - \boldsymbol{X} \boldsymbol{B}^{(t)})``. Additionally, ``\boldsymbol{P}`` is the ``nd \times nd`` permutation matrix such that ``\boldsymbol{P} \cdot \text{vec}\ \boldsymbol{Y} = \begin{bmatrix} \boldsymbol{y}_{\text{obs}} \\ \boldsymbol{y}_{\text{mis}} \end{bmatrix}``, where ``\boldsymbol{y}_{\text{obs}}`` and ``\boldsymbol{y}_{\text{mis}}`` are vectors of observed and missing response values, respectively, in column-major order, and the block matrix ``\boldsymbol{C}^{(t)}`` is ``\boldsymbol{0}`` except for a lower-right block consisting of conditional variance. As seen, the two MM updates are of similar form.
 
 # Special case: ``m = 2``
-When there are ``m = 2`` variance components such that ``\boldsymbol{\Omega} = \boldsymbol{\Gamma}_1 \otimes \boldsymbol{V}_1 + \boldsymbol{\Gamma}_2 \otimes \boldsymbol{V}_2``, repeated inversion of the ``nd \times nd`` matrix ``\boldsymbol{\Omega}`` can be avoided and reduced to one ``d \times d`` generalized eigen-decomposition per iteration. Without loss of generality, if we assume ``\boldsymbol{V}_2`` to be positive definite, the generalized eigen-decomposition of the matrix pair ``(\boldsymbol{V}_1, \boldsymbol{V}_2)`` yields generalized eigenvalues ``\boldsymbol{d} = (d_1, \dots, d_n)^T`` and generalized eigenvectors ``\boldsymbol{U}`` such that ``\boldsymbol{U}^T \boldsymbol{V}_1 \boldsymbol{U} = \boldsymbol{D} = \text{diag}(\boldsymbol{d})`` and ``\boldsymbol{U}^T \boldsymbol{V}_2 \boldsymbol{U} = \boldsymbol{I}_n``. Similarly, if we let the generalized eigen-decomposition of ``(\boldsymbol{\Gamma}_1^{(t)}, \boldsymbol{\Gamma}_2^{(t)})`` be ``(\boldsymbol{\Lambda}^{(t)}, \boldsymbol{\Phi}^{(t)})`` such that ``\boldsymbol{\Phi}^{(t)T} \boldsymbol{\Gamma}_1^{(t)} \boldsymbol{\Phi}^{(t)} = \boldsymbol{\Lambda}^{(t)} = \text{diag}(\boldsymbol{\lambda^{(t)}})`` and ``\boldsymbol{\Phi}^{(t)T} \boldsymbol{\Gamma}_2^{(t)} \boldsymbol{\Phi}^{(t)} = \boldsymbol{I}_d``, then the MM updates in each iteration become
+When there are ``m = 2`` variance components such that ``\boldsymbol{\Omega} = \boldsymbol{\Gamma}_1 \otimes \boldsymbol{V}_1 + \boldsymbol{\Gamma}_2 \otimes \boldsymbol{V}_2``, repeated inversion of the ``nd \times nd`` matrix ``\boldsymbol{\Omega}`` per iteration can be avoided and reduced to one ``d \times d`` generalized eigen-decomposition per iteration. Without loss of generality, if we assume ``\boldsymbol{V}_2`` to be positive definite, the generalized eigen-decomposition of the matrix pair ``(\boldsymbol{V}_1, \boldsymbol{V}_2)`` yields generalized eigenvalues ``\boldsymbol{d} = (d_1, \dots, d_n)^T`` and generalized eigenvectors ``\boldsymbol{U}`` such that ``\boldsymbol{U}^T \boldsymbol{V}_1 \boldsymbol{U} = \boldsymbol{D} = \text{diag}(\boldsymbol{d})`` and ``\boldsymbol{U}^T \boldsymbol{V}_2 \boldsymbol{U} = \boldsymbol{I}_n``. Similarly, if we let the generalized eigen-decomposition of ``(\boldsymbol{\Gamma}_1^{(t)}, \boldsymbol{\Gamma}_2^{(t)})`` be ``(\boldsymbol{\Lambda}^{(t)}, \boldsymbol{\Phi}^{(t)})`` such that ``\boldsymbol{\Phi}^{(t)T} \boldsymbol{\Gamma}_1^{(t)} \boldsymbol{\Phi}^{(t)} = \boldsymbol{\Lambda}^{(t)} = \text{diag}(\boldsymbol{\lambda^{(t)}})`` and ``\boldsymbol{\Phi}^{(t)T} \boldsymbol{\Gamma}_2^{(t)} \boldsymbol{\Phi}^{(t)} = \boldsymbol{I}_d``, then the MM updates in each iteration become
 
 ```math
 \begin{aligned}
diff --git a/src/MultiResponseVarianceComponentModels.jl b/src/MultiResponseVarianceComponentModels.jl
index bb79ec4..861ffcf 100644
--- a/src/MultiResponseVarianceComponentModels.jl
+++ b/src/MultiResponseVarianceComponentModels.jl
@@ -1,6 +1,7 @@
 """
-`MRVCModels.jl` permits maximum likelihood (ML) or residual maximum likelihood (REML) estimation 
-and inference for multivariate response variance components linear mixed models.
+__*MRVCModels*__ stands for __*m*__ultivariate __*r*__esponse __*v*__ariance __*c*__omponents
+linear mixed __*models*__. `MRVCModels.jl` permits maximum likelihood (ML) or residual
+maximum likelihood (REML) estimation and inference.
 """
 module MultiResponseVarianceComponentModels
 
@@ -120,9 +121,11 @@ reml::Bool      pursue REML estimation instead of ML estimation; default false
 ```
 
 # Extended help
-When there are two variance components, computation in each iteration can be
-reduced, which is attained with `MRTVCModel` instance. `MRVCModel` is more general.
-For `MRTVCModel`, the number of variance components must be two.
+When there are two variance components, computation can be reduced by avoiding large matrix 
+inversion in each iteration, which is achieved with `MRTVCModel` instance. __*MRTVCModels*__ 
+stands for __*m*__ultivariate __*r*__esponse __*t*__wo __*v*__ariance __*c*__omponents
+linear mixed __*models*__. `MRVCModel` is more general and is not limited to two variance 
+components setting. For `MRTVCModel`, the number of variance components must be two.
 """
 function MRVCModel(
     Y      :: Union{AbstractMatrix{T}, AbstractMatrix{Union{Missing, T}}},
diff --git a/src/fit.jl b/src/fit.jl
index 858cf9f..2363721 100644
--- a/src/fit.jl
+++ b/src/fit.jl
@@ -17,7 +17,7 @@ log::Bool           record iterate history or not; default false
 
 # Extended help
 MM algorithm is provably faster than EM algorithm in this setting, so recommend trying 
-MM algorithm first, which is default, and switching to EM algorithm if there are 
+MM algorithm first, which is by default, and switching to EM algorithm if there are 
 convergence issues.
 """
 function fit!(