Enhanced Telegram's callback_data with protobuf + base85

If you’ve ever developed a Telegram Bot, you probably know what callback_data is. If not, in short, it’s an arbitrary string that you can use in your backend to understand which button was pressed.

As your bot grows, your callback_data can become messy. This is something I have experienced. Today, I want to share a new way to handle this problem.

What’s wrong with callback_data?

I assume you already know the Bot API. To understand the issue better, let’s look at some examples. I’ll write the code in Scala, but the basic idea works with any programming language.

Some frameworks restrict access to that parameter and manage it manually. If it’s your case, than this article isn’t for you 🌞

Imagine you have a bot, which can manage a list of something. Let’s say it’s a list of goods:

Your application has a /ls command, which makes a bot response with a message with numbered list of goods and inline_keyboard to choose a user to interact with.
Each button has callback_data set to info_${id} value, where ${id} is a good id.
When user clicks on button, your bot responds with a message that contains information about the chosen good and an inline_keyboard with buttons like “Delete”, “Buy”, and “Assign a category”. Respectively, these buttons have callback_data set to delete_${id}, buy_${id}, and assign_category_${id}.
When users click on “Assign category”, your bot displays a hardcoded numbered list of categories to assign with, again, inline_keyboard with buttons to choose a category to assign. These buttons will have callback_data look like assign_category_${id}_${categoryId}, where ${id} is a user’s id and ${categoryId} is a chosen category.
And now imagine that you also have to update your initial info message after assigning a category. Now your callback_data becomes at least something like assign_category_${id}_${categoryId}_${messageId} 🤡.

While it looks okay to handle something like info_${id}, buy_${id}, and remove_${id}, more complex scenarios like assign_category_${id}_${categoryId}_${messageId} look weird and are hard to manage, especially when you have a lot of such scenarios.

Moreover, evil users could probably attempt to hack your application by passing unexpected callback_data content, and it’s much easier for them to do if you use such a plain and straightforward format. Well, you must check access regardless of the format you’re using, but still, protecting your format is another security wall.

How to fix this mess using protobuf + base85

In my Advanced Link Saver bot (a small article about tech stack), I have a lot of complex scenarios, and handling them was a nightmare. That’s why I have implemented the following approach to manage callback_data:

Describe every callback using a protobuf message.
callback_data is now not the plain string like info_${id}, but base85-encoded protobuf bytes.
Handlers are trying to decode base85 messages and then parse an underlying protobuf message.
And then you match it against type-safe protobuf messages.

base85 (also known as ASCII85) is just a way to encode bytes into a string. You can also use old-good base64 here, but base85 is more size-efficient. It can matter because the callback_data size is limited to 64 bytes.

Let’s see how it looks in code. For example, I have callbacks for viewing information about links and about categories. My protobuf descriptors look so:

message InfoCategory {
  uint32 categoryId = 1;
}

message InfoLink {
  uint32 linkId = 1;
}

message Info {

  oneof callbackData {
    InfoLink infoLink = 1;
    InfoCategory infoCategory = 2;
  }

}

Next, everything we need is to decode base85 from string to bytes and then try to decode it into a protobuf message. You can do it in any programming language, but in my case such handler looks like so in Scala:

class InfoCallbackHandler() {
  override def handle(callback: CallbackQuery) =
    (
      callback
        .data // this variable is a plain `callback_data` string
        .flatMap(ProtobufUtils.fromBase85String[Info])
        .map(_.callbackData)
    ) match {
      case Some(Info.CallbackData.InfoCategory(InfoCategory(categoryId))) =>
        // ...

      case Some(Info.CallbackData.InfoLink(InfoLink(linkId))) =>
        // ...

      case _ =>
        ZIO.fail(new IllegalArgumentException())
    }
}

That’s how I fill callback_data variable on buttons:

val infoButton = InlineKeyboardButton(
  text = "Info",
  callbackData = Some(
    ProtobufUtils.toBase85String(
      Info(
        Info.CallbackData.InfoLink(InfoLink(link.id /* int */))
      )
    )
  )
)

And here is the codec itself, but it seems interesting only to Scala developers. Actually, it just decodes/encodes protobuf messages from/to base85 data:

// Using https://github.com/fzakaria/ascii85 to decode/encode base85 data
import com.github.fzakaria.ascii85.Ascii85
// And https://scalapb.github.io to generate Scala classes from protobuf messages
import scalapb.GeneratedMessage
import scalapb.GeneratedMessageCompanion

object ProtobufUtils {

  def fromBase85String[Message <: GeneratedMessage](value: String)(implicit
    mComp: GeneratedMessageCompanion[Message]
  ): Option[Message] = mComp.validate(Ascii85.decode(value)).toOption

  def toBase85String[Message <: GeneratedMessage](s: Message)(implicit
    mComp: GeneratedMessageCompanion[Message]
  ): String = Ascii85.encode(mComp.toByteArray(s))

}

Conclusion

As you can see, now everything is type-safe, well-organized, better secured, and there is no room for mistakes. Unlike the base approach where you have to parse arbitrary strings using regular expressions or startsWith, where you have to remember how to construct these strings and do it correctly.

Thank you for reading this little article, I hope it will be useful to someone. Feel free to reach me if you have something to say.

And also take a look on the posts on similar topics:

Home

Articles

Notes

Projects

Contacts

🎭 Enhanced Telegram's callback_data with protobuf + base85

What’s wrong with callback_data?

How to fix this mess using protobuf + base85

Conclusion