🎭 Enhanced Telegram's callback_data with protobuf + base85
If you ever developed a Telegram Bot, you probably know what callback_data is. If not, in short, it’s an arbitrary string data that you can use at your backend to understand exactly which button was pressed.
When your bot grows, your callback_data
probably becomes a mess. That’s what I experienced, at least. So, today I want to share a novel method for dealing with this mess.
What’s wrong with callback_data?
I presume you are already acquainted with the Bot API. To get a more accurate understanding of the issue, let’s examine a few examples. I will write the code in Scala, but I believe the fundamental concept can be applied with any programming language.
Imagine you have a bot, which can manage a list of something. Let’s say it’s a list of users:
- Your application has a
/ls
command, which makes a bot response with a message with numbered list of users andinline_keyboard
to choose a user to interact with. - Each button has
callback_data
set toinfo_${id}
value, where${id}
is a user id. - When user clicks on button, your bot responds with a message that contains information about the chosen user and an
inline_keyboard
with buttons like “Delete”, “Ban”, and “Assign a category”. Respectively, these buttons havecallback_data
set todelete_${id},
ban_${id},
andassign_category_${id}.
- When users click on “Assign category,” your bot displays a hardcoded numbered list of categories to assign with, again,
inline_keyboard
with buttons to choose a category to assign. These buttons will havecallback_data
look likeassign_category_${id}_${categoryId}
, where${id}
is a user’s id and${categoryId}
is a chosen category. - And now imagine that you also have to update your initial info message after assigning a category. Now your
callback_data
becomes at least something likeassign_category_${id}_${categoryId}_${messageId}
🤡.
While it looks okay to handle something like info_${id}
, ban_${id}
, and remove_${id}
, more complex scenarios like assign_category_${id}_${categoryId}_${messageId}
look weird and are hard to manage, especially when you have a lot of such scenarios.
Moreover, evil users could probably attempt to hack your application by passing unexpected callback_data
content, and it’s much easier for them to do if you use such a plain and straightforward format. Well, you must check access regardless of the format you’re using, but still, protecting your format is another security wall.
How to fix this mess using protobuf + base85
In my Advanced Link Saver bot (a small article about tech stack), I have a lot of complex scenarios, and handling them was a nightmare. That’s why I have implemented the following approach to manage callback_data
:
- Describe every callback using a protobuf message.
callback_data
is now not the plain string likeinfo_${id}
, but base85-encoded protobuf bytes.- Handlers are trying to decode base85 messages and then parse an underlying protobuf message.
- And then you match it against type-safe protobuf messages.
Let’s see how it looks in code. For example, I have callbacks for viewing information about links and about categories. My protobuf descriptors look so:
message InfoCategory {
uint32 categoryId = 1;
}
message InfoLink {
uint32 linkId = 1;
}
message Info {
oneof callbackData {
InfoLink infoLink = 1;
InfoCategory infoCategory = 2;
}
}
My handler now looks so:
class InfoCallbackHandler() {
override def handle(callback: CallbackQuery) =
(
callback
.data // this variable is a plain `callback_data` string
.flatMap(ProtobufUtils.fromBase85String[Info])
.map(_.callbackData)
) match {
case Some(Info.CallbackData.InfoCategory(InfoCategory(categoryId))) =>
// ...
case Some(Info.CallbackData.InfoLink(InfoLink(linkId))) =>
// ...
case _ =>
ZIO.fail(new IllegalArgumentException())
}
}
That’s how I fill callback_data
variable on buttons:
val infoButton = InlineKeyboardButton(
text = "Info",
callbackData = Some(
ProtobufUtils.toBase85String(
Info(
Info.CallbackData.InfoLink(InfoLink(link.id /* int */))
)
)
)
)
And here is the codec itself, but it seems interesting only to Scala developers. Actually, it just decodes/encodes protobuf messages from/to base85 data:
// Using https://github.com/fzakaria/ascii85 to decode/encode base85 data
import com.github.fzakaria.ascii85.Ascii85
// And https://scalapb.github.io to generate Scala classes from protobuf messages
import scalapb.GeneratedMessage
import scalapb.GeneratedMessageCompanion
object ProtobufUtils {
def fromBase85String[Message <: GeneratedMessage](value: String)(implicit
mComp: GeneratedMessageCompanion[Message]
): Option[Message] = mComp.validate(Ascii85.decode(value)).toOption
def toBase85String[Message <: GeneratedMessage](s: Message)(implicit
mComp: GeneratedMessageCompanion[Message]
): String = Ascii85.encode(mComp.toByteArray(s))
}
Conclusion
As you can see, now everything is type-safe, well-organized, better secured, and there is no room for mistakes. Unlike the base approach where you deal with parsing arbitrary strings using regular expressions, startsWith
, where you have to remember how to construct these strings and construct them correctly.
Thank you for reading this little article, I hope it will be useful to someone. Feel free to reach me if you have something to say.
And also take a look on the posts on similar topics: