Personally, I would be happy even if it didn’t translate it but were able to give some half decent transcription of, at least, English voice into English text. I prefer having subtitles, even when I speak the language, because it helps in noisy environments and/or when the characters mumble / have weird accents.
However, even that would likely be difficult with a lightweight model. Even big companies like Google often struggle with their autogenerated subtitles. When there’s some very context-specific terminology, or uncommon names, it fumbles. And adding translation to an already incorrect transcript multiplies the nonsense, even if the translation were technically correct.
AFAIK, Voxelibre was renamed from “MineClone2” which was a fork from “MineClone” which is a different game than “Minetest game”