Start recording to transcribe speech in real-time
Removes silence/non-speech sections from speech sample
Measure distances between speech files using HuBERT embeddings
Compare audio clips and get alignment details