This repository is less of a portfolio and more of a work in progress; a snapshot of my work in data science, machine learning, and statistical modeling. Some projects are polished, others are experiments that may or may not lead anywhere, and most sit somewhere in between.
Lost in the rabbit hole of translating abstract ideas into something tangible—optimizing models, automating processes, and mostly just figuring out why the latest iteration broke.
- Programming Languages: Python, R, SQL
- Data Analysis Tools: Jupyter Notebook, RStudio
- Machine Learning Libraries: TensorFlow, PyTorch, scikit-learn
- Database Management: PostgreSQL, MySQL, MongoDB
- Cloud Platforms: Google Cloud Platform (GCP), Amazon Web Services (AWS)
- Development Tools: Docker, VS Code
- Other Tools: Git, Firebase, Tableau
These aren’t necessarily my “best” projects, but they represent different points in my process.
- Advanced Data Processing Pipeline
- DNA Analysis Tool
- Feature Selection Framework
- Recommender Systems
- Synsearch
- GitHub Analyzers
A high-level view, but in reality, each directory is filled with experiments, half-finished ideas, and scripts that probably need refactoring.
Portfolio/
├── Artificial-Intelligence/
│ ├── content-processing/
│ ├── research-tools/
│ └── web-automation/
├── Data-Science-and-Analysis/
│ ├── advanced-data-processing/
│ ├── data-quality-facelift/
│ ├── dna-analysis/
│ ├── early-analysis/
│ └── github-analyzers/
├── Machine-Learning-and-Deep-Learning/
│ ├── basics/
│ │ └── ML_Basics_with_Backpropagation_and_Gradient_Descent.ipynb
│ ├── feature-selection-optuna-remix/
│ ├── computer-vision/
│ ├── nlp/
│ └── recommender-systems/
├── Documentation/
│ ├── guides/
│ └── references/
└── Miscellaneous/
├── admin/
└── assets/
File/Directory | Summary |
---|---|
ML Basics | Machine learning fundamentals with backpropagation and gradient descent. |
Feature Selection Framework | Advanced feature selection framework combining PCA, LASSO, and Optuna optimization. |
CIFAR10 Analysis | Image classification using logistic regression on CIFAR-10 dataset. |
Deep Learning Language | Normalization and translation for language projects. |
Language Modeling | Text analytics and language modeling techniques. |
LSTM Text Modeling | Text modeling using LSTM neural networks. |
NLTK Embeddings | Word sense disambiguation and embeddings using NLTK. |
Recommender System | Implementation of recommendation algorithms. |
PSID Web Scraping | Automated data retrieval from PSID database. |
Web Summarizer | URL content summarization tool. |
AI Research Synthesizer | Research synthesis with Nvidia API integration. |
Synsearch | Advanced research synthesis tool. |
Advanced Data Processing | Comprehensive data pipeline with cleaning and transformation. |
Data Quality Facelift | Data quality enhancement with Streamlit interface. |
DNA Analysis | Comprehensive genetic analysis tool with health traits, ancestry analysis, and interactive dashboard. |
GitHub Portfolio Analyzer | Analysis tool for GitHub portfolios. |
GitHub Repo Analyzer | Repository analysis and insights tool. |
Credit Risk Analysis | Statistical analysis of credit risk factors. |
Housing Analysis | Housing market and phishing data analysis. |
Student Placement | Predictive modeling for student placement. |