Repo generating diarized transcript for the Lex Fridman podcast for NLP analysis
This project aims at getting transcripts for the Lex Fridman Podcast and undergo various NLP analysis tools to potentially extract interesting information from this huge corpus of fascinating conversation.
The raw text is obtained from the youtube generated transcript. I downloaded the audio files and "Diarized" these in order to be able to assign each speaker to their respective interventions, chronologically. The transcripts are not of the highest quality, but it may help some of you. If you're interested in helping to improve the transcript quality, please reach out!
The NLP analysis part is still under development.