My research strengthens the generalization and safety of the generative AI, spanning vision models, LLMs, and VLMs. As steps towards this goal, I work on:
- Generalizable multimodal representation learning: foundation models for table recognition (UniTable, Table Transformer, Self-supervised Pretraining), RGB-infrared fusion object tracking (DsiamMFT, SiamFT), structural health monitoring (system identification).
- Safe and robust machine learning models: LLM landscape (LLM Safety Basin), robust CNN design principles (#1 on RobustBench CIFAR-10), multi-task person tracking (SkeleVision), and defending LLM attacks (LLM Self Defense)
- Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models, NeurIPS'24 - [paper] [code]
- UniTable: Towards a Unified Framework for Table Recognition via Self-Supervised Pretraining, NeurIPS'24 Workshop - [paper] [code]
- Self-Supervised Pre-Training for Table Structure Recognition Transformer, AAAI'24 Workshop (Oral) - [paper] [code]
- High-Performance Transformers for Table Structure Recognition Need Early Convolutions, NeurIPS'23 Workshop (Oral) - [paper] [code]
- Robust Principles: Architectural Design Principles for Adversarially Robust CNNs, BMVC'23 (Best Poster Award) - [paper] [code]
- SkeleVision: Towards Adversarial Resiliency of Person Tracking with Multi-Task Learning, ECCV'22 Workshop - [paper] [code]