Skip to content

A lightweight, customizable chatbot for Telegram running on an ESP32 microcontroller. It's optimized for low-resource environments and embedded systems projects.

Notifications You must be signed in to change notification settings

engares/KNN-Based-Telegram-Chatbot-hosted-in-ESP32

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KNN-Based-Telegram-Chatbot-hosted-in-ESP32

Welcome to this small NLP project! This is Charlie, a little chatbot that can be hosted in a microcontroller and utilizes a k-nearest neighbors (k-NN) model along with TF-IDF (Term Frequency-Inverse Document Frequency) to generate intelligent responses. The bot is capable of learning from new data and interacting with users via Telegram. Logo
Real.Time.conversation.mp4

Features

  • Natural Interaction on Telegram: A smooth chat experience with a bot that runs efficiently on low-resource devices.
  • Synonym and Variance Tolerance: The bot recognizes synonyms and variations, allowing it to understand different ways of asking the same question.
  • Learning Capability: Users can train the bot with new phrases and responses using simple commands.
  • Highly Customizable with Small Data: You just have to upload your own training data in a .txt file, more info below.

How It Works

The bot uses a combination of text processing and machine learning algorithms to generate intelligent responses in real-time. All data is stored in the ESP32's flash memory, making it a self-contained solution. Here’s a summmarised explanation:

  • Text Processing: The bot first cleans and normalizes user input by breaking down the text into tokens (words) and applying stemming techniques to focus on the root form of each word. The data is also augmented with a dictionary of common synonyms. This ensures that different forms of a word or similar words (e.g., "running" vs. "run", "movies" vs "films") are treated as the same concept.

  • TF-IDF & k-NN Algorithms:

    • TF-IDF (Term Frequency-Inverse Document Frequency): This algorithm calculates the importance of each word in the user's input relative to the entire knowledge base. Words that are more unique to the input are given higher weights.
    • k-NN (k-Nearest Neighbors): The bot then uses the k-NN algorithm to compare the processed input against all stored interactions, calculating the cosine similarity between vectors representing these interactions. The bot selects the response corresponding to the most similar previous interaction.

Implementation of Cosine Similarity

To determine the most relevant response, the bot calculates the cosine similarity between the input vector and each interaction vector in its knowledge base. Cosine similarity measures the angle between two vectors, providing a numerical value that represents their similarity.

Simple example of Cosine Similarity

Two-simplified-documents-are-represented-as-vectors-of-word-count-Their-cosine

Source of the image: Vu, N. Q., & Bezemer, C. (2021). Improving the Discoverability of Indie Games by Leveraging their Similarity to Top-Selling Games. Research Gate. https://doi.org/10.1145/3472538.3472548

Note: The training data used by this bot is synthetic and was generated using a LLM. This allows for a controlled low cost dataset but may not perfectly replicate all real-world scenarios.

Getting Started


ESP32-S3-VROOM N16R8

Prerequisites

To get started with this project, you'll need the following:

  • An ESP32 Board: For this project I used an ESP32-SR-VROOM N16R8 board. It should be compatible with other ESP32 boards. I got mine for ~4$ on Aliexpress
  • Arduino IDE 1.8.X: The project requires Arduino IDE version 1.8.X. Greater versions are NOT compatible with ES32FS Tool.
  • ESP32 Board Support: Install the ESP32 board support in Arduino IDE.
  • ESP32FS Tool: This project uses the ESP32FS tool to upload data to the ESP32's SPIFFS (SPI Flash File System). Make sure to install this plugin in your Arduino IDE.

Setting Up the Project

  1. Clone the Repository:

    git clone https://github.com/engares/KNN-Based-Telegram-Chatbot-hosted-in-ESP32.git
    cd esp32-chatbot
  2. Open the Project in Arduino IDE:

    • Open main.ino in the Arduino IDE.
  3. Configure WiFi and Telegram in main.ino:

    • Update the ssid and password variables with your WiFi credentials
    • Add your Telegram bot token in the botToken variable.
  4. Upload the Code:

    • Connect your ESP32 to your computer.
    • Select the correct board and port in Arduino IDE. (Additional configuration may be required, please check a simple guide here)
    • Upload the code to your ESP32.
  5. Upload Data to SPIFFS:

    • You can personalize the chatbot responses, preparing the data folder adding your training data file (trainingData.txt) inside it. If not, you can use the original file.

    • Make sure it's formated like this:

      # [User message] -> [Bot's response]
      
      # Examples
      Hey! -> Hello! How can I help you today? 
      Where is the store located? -> You can find us in 123 Fake Street.
      ...  -> ...
    • Note that the extension of the data the ESP32 can handle varies from the memory available on the ESP32 model, the original data contains 17753 characters

    • Use the "ESP32 Sketch Data Upload" tool to upload this data to the ESP32.

Using the Bot

  • Start the Bot: After uploading, the bot will connect to WiFi and be ready to receive messages via Telegram.
  • Just Chat!
  • Aditional Commands:
    • /start: See the welcome message.
    • /help: Get a list of commands and help information.
    • /about: Learn more about how the bot works.
    • /train input -> response: Train the bot with new phrases.

Project Structure

  • main.ino: The main Arduino sketch that initializes the bot and handles communication with Telegram.
  • knn.h and knn.cpp: These files contain the k-nearest neighbors logic, TF-IDF calculations, and text processing functions.
  • data/: Contains the training data file (trainingData.txt) that is uploaded to SPIFFS.
  • .ign/: Images and stuff for this README.

Contributing

Feel free to open issues or submit pull requests with improvements, bug fixes, or new features :)

About

A lightweight, customizable chatbot for Telegram running on an ESP32 microcontroller. It's optimized for low-resource environments and embedded systems projects.

Topics

Resources

Stars

Watchers

Forks

Languages