Summarizit
Introduction
We were tasked with creating a chatbot that can dissect and summarize YouTube videos, as there is no chatbot out there that can do this currently. This project took place from January to June of 2024.
The Team
Leads
Kaushal Marimuthu — Project Mentor and interim Project Manager
Developers
Divya Vaddavalli
Benedict Nursalim
Aine Keenan
Rohan Sheth
Esha Dasari
Designers
Trung Vo
Sruthi Ramesh
Timeframe
Jan — Jun 2024 | 16 weeks
Tools
Figma, Next, TailwindCSS, MongoDB, Node, Flask, Google Gemini
The Project
Design
Design Decisions
We did a lot of research looking at popularly used chatbots like ChatGPT, Claude, etc. and collected data from surveys from users. The end product was a simplistic but functional UI, slowly built from lo-fi to hi-fi iterations. The screen features a toolbar on the left where previous/existing chats are located and can be navigated. The user can log in to retrieve these chats or create a new one that is saved and can be retrieved at a later time. The messaging screen is simple and easy to read, but has the functionality that a chatbot would need, along with suggestions of commonly asked questions.
Development
Frontend: Next.ts website with TailwindCSS
Backend (Data): MongoDB, separated into User, Chat, and Message
Backend (Model): Gemini instance hosted on a Flask server
Development Decisions
We decided to use Next due to the advantages and ease it provides with routing, fonts, etc., as compared with React. The frontend prompts the user to login/signup to begin, and upon successful login, queries the backend for all of the user’s chats. Then, when the user selects an existing/new chat, the frontend queries all the messages for the chat, and displays it. When the user types a message, the frontend requests the Flask server with the entire existing message history to retrieve Gemini’s response.
Takeaways & Challenges
Losing Members — Adaptability
One main challenge we had was losing some members due to personal reasons. Losing members is always hard, but we were able to pick up the slack and persevere through it to deliver a strong MVP by the end of the project. People had to fill in roles unexpectedly, and we all learned the importance of adaptability and resilience.
ML Model — Always zoom out
We spent a long time research different models we could use for summarization, eventually coming to a HuggingFace architecture. While this worked, it did not completely satisfy our needs, and was unable to provide longer summaries. Even though we had already sunken a lot of time into developing the HuggingFace architecture, we decided to pivot to Gemini, which gave a higher response token limit. By not being so narrow and only considering HuggingFace after originally choosing it, and zooming out to realize that we have a better option, we were able to significantly improve our MVP.
This project taught all of us a lot, working in cross-functional teams, owning aspects of a project, and delivering despite constant change and blockers. We are very proud of what we accomplished these past few months, and are also proud of the cohesive team we became by the end of the cohort.