TextTron 3000™
Feb 09, 2016
Around a year ago, I posted some source code from Thimbleweed Park and was immediately demonized for including text in my source code. Cries of "How dare you!" quickly followed by readers covering the eyes of young ones.
Yes, text in source code is a really bad idea, but this isn't my first trip to the rodeo. I have a plan, a very cunning plan, a plan I have used many times in the past, a plan that I will now share with you.
First I need to state that I like text in source code, especially when in comes to adventure games. Text is an integral part of the game and it is woven into just about everything coded, from funny object descriptions to charming comebacks and moving dialogs.
Coding an adventure game is a very creative process. It's not cleanly divided into coders, designers and writers. I expect the people doing the game coding to be a little bit of all three of those things, it's why I like the text intermixed with the code. You can read the code and get a feel for what is happening. You can tweak code and text at the same time, getting everything to play just perfectly. You can't adjust one without the other.
But it does create a dilemma. If the game was in one language and never to be translated or voice recorded, you could blindly add text to your code without a care or worry, but that's not the world modern games live in.
Let's look at the process. Here is some code from the Thimbleweed Nickel:
{
name = "empty frame"
verbLookAt = function()
{
sayLine("Very abstract... not a great use of color.",
"...though I do like the subject matter.")
}
verbPull = function()
{
sayLine("It won't budge.")
}
}
We sit with the text and code like this for quite a while. There is no rush to exact it. Once exacted, you can still change it, inserting and deleting at will, but it's a little harder.
I wrote a tool in python called text_tron.py that does all the extracting (more on that later). I run the tool...
which transforms the above code into...
{
name = TEXT(0,"empty frame")
verbLookAt = function()
{
sayLine(PLAYER(0,"Very abstract... not a great use of color."),
PLAYER(0,"...though I do like the subject matter."))
}
verbPull = function()
{
sayLine(PLAYER(0,"It won't budge."))
}
}
The lines have been extracted, but they haven't been assigned numbers or placed in the text database yet.
At some later point (days, not months), I'll run this...
...and the code becomes...
{
name = TEXT(10087,"empty frame")
verbLookAt = function()
{
sayLine(PLAYER(10088,"Very abstract... not a great use of color."),
PLAYER(10089,"...though I do like the subject matter."))
}
verbPull = function()
{
sayLine(PLAYER(10090,"It won't budge."))
}
}
...and a .tsv file is written (or added) to...
PLAYER 10088 Very abstract... not a great use of color. Nickel 588
PLAYER 10089 ...though I do like the subject matter. Nickel 589
PLAYER 10090 It won't budge. Nickel 593
The file the line came from, along with the line number is stored to make sorting and printing recording scripts easier. It's good to know where the line came from.
TEXT and PLAYER are macros.
#macro AGENT($a,$b) "@$a:$b"
#macro PLAYER($a,$b) "@$a:$b"
#macro RAY($a,$b) "@$a:$b"
#macro REYES($a,$b) "@$a:$b"
...
#macro NATALIE($a,$b) "@$a:$b"
#macro POSTALWORKER($a,$b) "@$a:$b"
#macro CHET($a,$b) "@$a:$b"
#macro SHERIFF($a,$b) "@$a:$b"
They just transform each line of text from "Very abstract... not a great use of color." to "@10088:Very abstract... not a great use of color."
In the final build of the game, the #macros will get replaced with this...
#macro AGENT($a,$b) "@$a"
...and the text will get stripped out, leaving only the line number.
when the game engine goes to display text, if it begins with '@', it knows a number follows and that is looked up in the text database and the (possibly translated) text is used. During development, we include the text in the string so it can be updated, changed and checked against the database.
Rule #1 of Translation Club: Never ever ever edit the text in the database. If there is a change, make it to the source, then you run...
...and any changes are moved from the source to the database.
Text is usually locked before sending it to the translators, so nothing should change after that. If it does, you make the changes in the source and run --update again. This will add the lines to the translation files as well.
Text databases can also be hotloaded, so a translator can edit the .tsv files, hit a key in the already running game and the new text is loaded and used. It should make translating easier.
I gave Boris the Nickel and he translated it and is still speaking to me, so I guess it went OK. Right Boris? Boris? Hello? Boris?
The extractor (text_tron.py) is pretty simple and driven by regex. All of the regular expressions used are...
re_sayLine = re.compile("sayLine\\(\\s*((\\\"(\\\\.|[^\"])+\\\"(\\s*,\\s*)?)+)\\s*\\)", flags=re.DOTALL)
re_quoteText = re.compile("(\"(\\\\.|[^\"])*\")", flags=re.DOTALL)
re_objectName = re.compile("name\\s*=\\s*(\"(\\\\.|[^\"])+\")")
re_addText = re.compile("([A-Z][A-Z0-9_]+)\\(\\s*([0-9]+)\\s*,\\s*\"((\\\\\"|[^\\\"])*)\"\\s*\\)")
re_FindMacro = re.compile("([A-Z][A-Z_0-9]+)\\(([0-9]+)\\s*,\\s*\\\"((\\\\.|[^\\\"])+)\\\"\\s*\\)")
re_yackOption = re.compile("^\\s*([0-9])\\s+(?![A-Z]+\\()\"?(.*?)\"?\\s*->", flags=re.MULTILINE)
re_yackSayLineCond = re.compile("^\\s*([a-zA-Z]+):\\s+(?![A-Z]+\\()\"?(.*)\"?(\s+\\[)", flags=re.MULTILINE)
re_yackSayLine = re.compile("^\\s*([a-zA-Z]+):\\s+(?![A-Z]+\\()\"?(.*)\"?", flags=re.MULTILINE)
These grab about 99% of the text. Every so often we doing something goofy and have to hand add the macros, but it would take me more than to write the extraction code than just change the game source. It's a small edge case.
"Wait... what about dialog?"
Good question...
That is why we have different #macros for each of the actors. You notice that the actor name was placed in the .tsv file...
NATALIE 10091 Welcome to the Thimbleweed Nickel. Nickel 593
This serves no other useful purpose than to tag actors when exporting scripts. Because there are five playable characters, some lines need to be read by five different actors. When the script is exported, it sees PLAYER, and knows it need to emit scripts for RAY, REYES, RANSOME, and DELORES. When the audio is recorded, it will be saved into four files...
REYES_10090.wav
RANSOME_10090.wav
DELORES_10090.wav
It's the same line (hence the 10090), but it's read by four different people.
"Hey, what about Franklin? Did you cut him from the game? I want my Kickstarter money back!"
No, he's not cut from the game, but he's a ghost and he has very different interactions, so we don't need to record him for most of the lines.
- Ron
Easy and performing!!
Still love the game (or what I have seen so far) and your blogposts! Thanks to the whole Thimbleweed-Team!
TEXT 10087 empty frame Nickel 581
PLAYER 10088 Very abstract... not a great use of color. Nickel 588
2) Does the engine allow simultaneous dialog lines spoken by different characters?
For our C64 point-and-click each line is stored in two languages in the script.
Not very efficient but I fully second your notion of texts belonging to scripts (at least during scripting).
I want chipmunk agents!
Regular expressions are very, very important.
Like Wally's maps.
"\\\"" matches the opening quote, followed by one or more (see the "+" suffix) of what is in the parentheses, that is,
- either an escape sequence ("\\\\.", a backslash followed by any character (that's what the dot means))
- or anything but a quote. "[^...]" is will match any character but those found in the brackets, following the carret. In this case "\"", i.e. a double quote. The pipe ("") character means "either".
At last, a final double quote.
Note to Ron: I'm not familiar with Python regexes, but I wonder if the quotes couldn't also be matched by just "\"" rather than "\\\"".
http://xkcd.com/1638/
But why do you would have a backslash in the text?
Then I wanted to provide a translation system, and that's where I extracted all the lines and replaced them with IDs... however, if I look at the code now, I see just horrible things like "say("ball.description1"); say("ball.description2");" and I can't think of programming this way.
What I'm doing now is hardcoding it anyway, prefacing it with a context (like say("ball.desc","My bowling ball"); say("ball.desc","I never lost a game with it");) then, at runtime, make the engine look up for the corresponding pair context/string in the language files for the current language. If present, it returns the translation. If not, it returns the English line just to be sure.
I'm lacking a script that extracts everything and automatically keeps the language files up to date, but for now that's not a problem. I don't have deadlines, that's (unfortunately) not my day job.
If you had to translate a simple word like "ball", how could you tell a translator if you're referring to the spherical object instead of a formal dance, without having him to play the game to find out?
Do you miss the ß?
Orthographically its derived from "wise", not "white".
So whether 1986 or the future (now), thats how it is spelled.
The "orthography" remark was also meant as a funny note for Boris to generally stick in 1986 (and not today's orthography) ;-)
However I wonder what the Duden says about the case of "seltsames"......
The word "Naseweis" always reminds me of a german poem by Anna Ritter, which I was annually ordered to listen to attentively on Saint Nicholas Day ("... was drin war, möchtet ihr wissen? Ihr Naseweise, ihr Schelmenpack - meint ihr, er wäre offen, der Sack?..."). Therefore it sounds extremely 1986-ish to me as well, even though the poem is a bit older.
By the way: "Schelmenpack" (archaic for something like "rascal scalawag") is also a nice word, albeit it's probably not in the game (yet). I like those "old-school" taunts, as they sound rather funny than harsh nowadays.
PS: It confirms the pleasing fact that the above translation is written by the same person who translated the first Monkey Island game into German. For instance, the general dealer on Méleé Island uses the word "Grünschnabel" (archaic for "greenhorn") when you enter his store. Maybe I'm about to over-interpret, but in my opinion there is a similar (subliminally humorous) language use. Anyway, I think that Boris is actually the best choice for the german translation!
I too would like to read (nowadays) uncommon swearwords like "Tunichtgut" ,"Dämlack" or "Xanthippe".
Because they are not so evil they dont need to be partly replaced with "tuna.."
Im really looking forward to the equivalent of the "root beer" -> "grog" translation :)
I've expressed my concerns earlier on the blog, the biggest of which is that a simple regex won't catch everything. Obviously, they can go in and manually edit stuff and if anything still gets omitted, it can be fixed with a post-release patch. It's not the end of the world by a long shot. The other, lesser, concern is that it's not flexible enough to handle dynamic text in interesting ways (even if TP doesn't use any, it's always a good idea to be forward-looking because it might save costs some day).
That said, I'm not sure what the point of the macros is. Might as well use the English strings as in-source keys directly and save a step. Regex would still be used for collecting the English strings, of course.
Development is quite often dirty and everyone is aware of that. Don't give more weight than necessary to my words because a negative code review (not that this was one) is not at all like saying something sucks and needs to be defended against the evil troll before everyone gets their feelings hurt. To me, it's more like shouting at a smoke detector for detecting smoke that you think is not dangerous (which may or may not be the case but the reaction is just as silly regardless).
Happy now?
Nothing tends to get missed becasue I can set a mode where the system will no long accept pure text without throwing up warnings. It would only get missed in the testers never saw it, which is a whole different kind of failing. :-)
I've admited in that wall of text I wrote the last time that natural-sounding recordings make dynamic text a lot more complicated (even if I think there are some solutions worth exploring) but I'm really thinking more about things like text on a newspaper in a close-up influenced by events in the game.
- Darth Vader
After me the deluge :)
One feature you didn't mention which you may want to consider adding right next to the --update would be a --verify. Then if you're not certain if there have been changes made to either the database OR the source, you could verify one against the other.
Though, there should rather be periods after both "Mordfall" and "Plausch" instead of commas, because there are two main clauses in both sentences.
PS: There are several confusing rules due to the controversial reform in 1996. If anyone is interested in that reform, take a look at: https://en.wikipedia.org/wiki/German_orthography_reform_of_1996
OT I think I love you Ron.
I also think everyone is going to be surprised at how good this game is, and how little watching its progress has actually spoiled the surprise.
GotY.
The comma is correct.
http://www.duden.de/sprachwissen/rechtschreibregeln/komma#K117
She studied proofreading, so this time it made sense.
;)
Her opinion: the second comma is correct.
Or can there be different answers to the same actions by different actors?
(E.g. Ray saying "No way!" and Reyes saying "Nope!" or some similar stylistical differences.)
LOL! It took me a second or two to figure out that you guys meant dialog text. :)
-dZ.
I'll admit I didn't grow up in the eighties, but I'm German and I think I have my legitimate doubts about this translation coming across as natural parts of a dialogue.
Just my copper...
P.s.: As far as the already here discussed world 'Naseweis' goes, for example... really? Did people really speak like that? I think we're talking uninspired, maybe bored, toughed-down and too-down-to-earth FED agents, not people who learned their vocabulary by reading theater-plays and reenacting olden fairy tales ; )