We are using multi-modal models to help visually impaired people navigate and understand the surroundings