This project leverages GPT-3.5 to synthesize medical records, aiming to provide a compliant solution for researchers needing access to patient data while respecting privacy laws like HIPAA and GDPR.
main.py
: Main function executiondata_generation.py
: Synthetic patient profile and record generationgpt_processing.py
: GPT model utilization for record creationfile_management.py
: Transformation of GPT texts into CSVrequired_libraries.py
: Library imports
- Evaluate synthetic medical record generation feasibility and accuracy
- Process analysis from prompt engineering to database-ready notes
- Apply findings in language model research and prototyping
- Prompt Engineering: Develop prompts for GPT-3.5 to accurately generate medical data
- Data Generation: Create patient profiles and detailed medical records
- Data Processing: Convert generated texts to structured CSV format
- Generated 1079 balanced patient records by nationality and gender
- Noted realistic age distributions and disease prevalences
- Addressed potential biases and the importance of reliable data sources
- Enhancing NLP tasks in healthcare
- Supporting medical education and research
- Prototype testing within privacy regulations
The project highlights the potential of GPT-3.5 in synthesizing realistic medical records, providing a valuable tool for medical research within ethical and legal standards.
Install dependencies listed in required_libraries.py
. Run main.py
to begin synthetic medical record generation. Follow component-specific instructions for detailed operations.
Read more about this project here