I'm Shamik and I enjoy building solutions to problems, mostly through programming (and occasionally with WD-40). I work as a Lead Data Scientist building machine learning applications for detecting and anonymizing PII and PHI in data breaches. I am also a part-time contributor to the BigScience Workshop, the BigBIO effort and the BigCode Project from π€. In addition, I am working with PIISA, a collection of data scientists, software developers and lawyers to establish an open standard for PII protection that can be used across the globe. You can follow our efforts here. I also like to cook π¨βπ³
βββ Interests
β βββ Natural Language Processing
β βββ Explainable Machine Learning
β βββ AI Ethics
β βββ System Design
β βββ PII Anonymization
βββ Occupations
β βββ Software Engineer
β βββ Graduate Research Assistant
β βββ Lead Data Scientist
β βββ Senior Researcher
βββ Locations
β βββ Kolkata, India
β βββ Boston, MA, USA
β βββ Tallahassee, FL, USA
β βββ Leeds, England
βββ Book Suggestions
βββ Fiction
β βββ The Three Body Problem - Cixin Liu
β βββ All the Light we cannot see - Anthony Doerr
β βββ Purple Hibiscus - Chimamanda Ngozi Adichie
βββ Non-Fiction
β βββ Algorithms of Oppression - Safiya Umoji Noble
β βββ Braiding Sweetgrass - Robin Wall Kimmerer
| βββ Chaos Machine - Max Fisher
| βββ Viral Justice - Ruha Benjamin
β βββ Weapons of Math Destruction - Cathy O. Neill
βββ Cookbooks
βββ The Food Lab - J. Kenji Lopez-Alt
βββ Mi Cocina - Rick Martinez
βββ Dessert Person - Claire Saffitz
Publications
- Explaining AI for Malware Detection: Analysis of Mechanisms of MalConv
- PhD Thesis: Towards Explainability in Machine Learning for Malware Detection
- Static Malware Modeling and Detection using Topic Models
- BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing
- The bigscience roots corpus: A 1.6 tb composite multilingual dataset
P.S. The tree was built using Rich