Full-Stack Software Development

AI Tutoring Assistant

My Role
Software Developer
Timeline
January 2023 - March 2023

Summary

https://www.tutoringassistant.ai/

A Tutoring Assistant that uses AI to answer HSC student's questions, help them learn, and generate essays.

Tech Stack

* React

* TailwindCSS

* Nodejs

* NestJS

* PostgreSQL

* TypeORM

* Auth0

* Pinecone (managed vector database service)

* OpenAI (vector embeddings model, and GPT-3)

* Heroku

* AWS

Solution Design

Entity Relationship Diagram: https://dbdiagram.io/d/643787098615191cfa8d875a

Chat

* Student notes are stored as objects in an S3 bucket.

* Student notes are converted to HTML, parsed, broken up into chunked paragraphs, and each chunk is converted into a vector using OpenAI's embeddings model. Each vector is then stored in Pinecone, along with relevant metadata (plain text, keywords, subject, author, title of notes, etc).

* Student asks a question via the app.

* Question is embedded as a vector using OpenAI's embeddings model, and this embedding is compared against all of the stored notes embeddings in Pinecone using Cosine Similarity to return the top 5 most similar results. The subject the question relates to, as well as keywords from the student's question, are also used as a filter against the Pinecone vector metadata, thereby making this a Hybrid search engine (it uses both keyword search and vector search to return relevant results).

* Once the top 5 most similar results are returned, they are appended to a specific prompt that is sent to OpenAI's GPT-3, that instructs it to answer the student's question using the previously returned results as context for its answer. In the AI field, this technique is referred to as Retrieval Augmented Generation (RAG).

* Once the answer is returned by GPT, the final step is to add references to the answer so the student knows where they can find more information on the topic. This is done by looping over each sentence in the answer and again using Cosine Similarity with the vector embedding of the sentence against the vector embeddings of the top 5 similar results returned, and labelling the one with the highest similarity score as the 'reference'. Whilst the earlier cosine similarity for the chat itself is handled via Pinecone automatically, this part of the feature was done in application code.

* The end result is that the student can ask any question for a particular subject and receive a highly accurate answer with references to notes for further reading.

* The same general technique as referenced above is used to return the Syllabus section the question relates to, as well as related questions for better UX.

Essay Generator

* At the time of creating this app, GPT-3 had a very limited context window, and limited token generation. Thus, generating an essay of possibly unlimited length required a bit of creativity.

* Firstly, the student enters their essay question. The app then provides suggested thesis points and quotes/evidence that relates to those thesis points, using a similar approach to the Chat feature (i.e. RAG). The student can also add their own thesis points / quotes too.

* Once they are happy with the selected thesis points / quotes for their essay, they click generate.

* In generating the essay, the introduction, body paragraphs and conclusion are generated separately to avoid hitting the token generation limit for the OpenAI GPT-3 API. The body paragraphs are generated in parallel via concurrent requests to the API, using the thesis points, quotes and a specific prompt for essay generation. After the introduction, body paragraphs and conclusion have been generated, they are appended to each other and returned.

* The student can then edit their essay using a custom text editor.

Interesting Challenges / Observations

* How you store the vector embeddings matters a lot. Specifically, the more dense your vectors are in terms of information, the less lossy your results will be. Specifically, prepending or appending keywords or titles to a chunk of text before embedding it as a vector results in much better similarity results, and hence much better answers from GPT (the more accurate the context provided, the more accurate the answer).

* You need surprisingly little data to get good similarity results - again, what is important is the information density of your stored vectors.

* It is difficult to get consistently accurate results with GPT, even with extremely explicit prompts. This was a problem both in terms of trying to improve answer accuracy, and also just trying to get it to do basic things like return 3 related questions with no numbering/bullet points (sometimes it would hallucinate these even when explicitly instructed not to).

* At the time of creating this app, the OpenAI GPT API was under extremely heavy load. This made it important to have strong error handling in place, as well as utilising techniques such as exponential-backoff wrapped around all API requests to retry them on failure.

* The app landing page exposes public endpoints, so I implemented rate limiting on these to avoid any potential bot/spam attacks.

* Interestingly, managed vector database services were also very new when I was building this app. I decided to use Pinecone because it was one of the only available at the time. One of the main challenges with this decision was that 1) it was expensive and 2) it did not handle batched transaction rollbacks for failed vectors (i.e. if I uploaded a bunch of vectors and one of them failed it did not automatically rollback the vectors already committed in that same transaction). This meant that I had to implement manual transaction rollbacks via my application code in this scenario via storing the vector job ID in a database table along with the status of the batched processing, and if it failed then it would hit the DELETE endpoint of Pinecone to delete all vectors associated with the particular batch, then automatically retry it on the next initiated run. 

* If I was to develop a similar app in the future, I would not use a managed vector database service until I hit serious scale in terms of users. Instead, I would simply store all vector embeddings in a PostgreSQL table along with their relevant metadata, and I would perform the Cosine Similarity execution either directly in my application code or via a PostgreSQL plugin. This would be sufficient for an MVP.

Results

* The app was used by over 150 students during their HSC year in 2023.

* Over 10,000 student questions were answered across 1350 separate chat sessions.

Demo

In-App Images