Mutual information (MI) estimation has attracted significant research attention as a fundamental measure of statistical dependence between random variables, widely used in feature selection, clustering, representation learning, and information-theoretic deep learning. Classical approaches such as binning methods, k-nearest neighbors (kNN)-based estimators, and kernel density estimation (KDE) provided early foundations but suffered from bias and scalability limitations. To address these challenges, recent research explores variational bounds (e.g., MINE—Mutual Information Neural Estimation), contrastive learning frameworks (e.g., InfoNCE), and adversarial estimation techniques that leverage neural networks to approximate MI in high-dimensional spaces efficiently. These methods have been successfully applied in self-supervised representation learning, generative modeling, causal discovery, and reinforcement learning, demonstrating superior performance over traditional estimators. Ongoing studies also focus on improving estimator stability, reducing sample complexity, and extending MI estimation to structured data such as graphs, sequences, and multimodal inputs, positioning it as a critical tool for modern machine learning and artificial intelligence applications.