Methods
For our code, we used CBML to convert our source documents into TEI. For example, when a character is speaking in a speach bubble, it is notated as below.
<cbml:balloon type="speech" who="#narrator">
<p>THE HISTORY OF LIFE ON OUR PLANET SPANS 4.5 BILLIONS YEARS...<p>
</cbml:balloon>
Below to the left is a page out of Diavolo Emerges Part 1, and on the right is that page but converted into CBML.
<pb n="4"/>
<cbml:panel n="1" characters="#trish">
<cbml:balloon type="speech" who="#trish">
<p>TH... THE<emph rendition="#b">POWER</emph> TO DEFEAT THE BOSS!? YEAH, RIGHT!</p>
</cbml:balloon>
<cbml:balloon type="speech" who="#guido-mista #trish->mista">
<p>IT'S BECAUSE YOU BROUGHT THE<emph rendition="#b">ARROW</emph>...</p>
</cbml:balloon>
</cbml:panel>
<cbml:panel n="2" characters="#trish #turtle->polnareff">
<cbml:balloon type="speech" who="#trish">
<p>P...</p>
</cbml:balloon>
<cbml:balloon type="speech" who="#trish">
<p>POLNA-REFF, YOU PIECE OF SHIT!</p>
</cbml:balloon>
<cbml:balloon type="speech" who="#turtle->polnareff">
<p>IT'LL HAPPEN AT DIFFERENT RATES FOR ALL OF US, BUT IT WILL HAPPEN!</p>
</cbml:balloon>
<cbml:balloon type="speech" who="#turtle->polnareff">
<p>IT'S FINALLY TIME TO BEGIN...</p>
</cbml:balloon>
</cbml:panel>
<cbml:panel n="3" characters="#guido-mista #diavolo #trish #turtle">
<cbml:balloon type="speech" who="#trish">
<p><emph rendition="#b">ALL YOU"VE DONE IS TIE A KNOT AROUND ALL OUR NECKS!</emph></p>
</cbml:balloon>
<cbml:balloon type="speech" who="#diavolo">
<p>STOP IT, MISTA!</p>
</cbml:balloon>
<cbml:balloon type="speech" who="#diavolo">
<p>NOBODY COULD HAVE PREDICTED THIS WOULD HAPPEN! WE'D ABE ALL DEAD IF IT WEREN'T FOR THE<emph rendition="#b">ARROW</emph></p>
</cbml:balloon>
<cbml:balloon type="speech" who="#turtle->polnareff">
<p>AND THE SAME THING WOULD HAVE HAPPENED ALREADY, AS SOON AS ANYONE ELSE FOUND THE<emph rendition="#b">ARROW!</emph></p>
</cbml:balloon>
</cbml:panel>
<cbml:panel n="4" characters="#turtle">
<cbml:balloon type="speech" who="#trish">
<p>. . . . . . . . . . . . . . . . . .</p>
</cbml:balloon>
</cbml:panel>
<cbml:panel n="5" characters="#silver-chariot #guido-mista #trish #diavolo #giorno #turtle">
<cbml:balloon type="speech" who="#group">
<p>FIND A WAY TO DEFEAT REQUIEM!</p>
</cbml:balloon>
<cbml:balloon type="speech" who="#group">
<p>WE HAVE TO GET THE<emph rendition="#b">ARROW</emph>SOMEHOW!</p>
</cbml:balloon>
<cbml:balloon type="speech" who="#group">
<p><emph rendition="#b">THE BOSS</emph>IS GONNA MAKE HIS MOVE, TOO, WHEREVER HE IS!</p>
</cbml:balloon>
</cbml:panel>
Our XSLT
Down below is the XSLT we used to convert our chapters from CBML to HTML.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.tei-c.org/ns/1.0"
xmlns:cbml="http://www.cbml.org/ns/1.0"
xmlns="http://www.w3.org/1999/xhtml"
version="3.0">
<xsl:output method="xhtml" html-version="5" omit-xml-declaration="yes"
include-content-type="no" indent="yes"/>
<xsl:template match="/">
<html>
<head>
<title>Diavolo Emerges Part 1</title>
</head>
<body>
<h1>All Dialogue</h1>
<ul>
<xsl:apply-templates select="//cbml:balloon"/>
</ul>
</body>
</html>
</xsl:template>
<xsl:template match="cbml:balloon">
<li>
<b><xsl:value-of select="@who"/>: </b>
<xsl:apply-templates/>
</li>
</xsl:template>
<xsl:template match="p">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="emph">
<b><xsl:apply-templates/></b>
</xsl:template>
<xsl:template match="sound">
<i>(<xsl:apply-templates/>)</i>
</xsl:template>
</xsl:stylesheet>
How we scraped the episode files from the web.
Link to GitHub if page is not loading.How we converted our text files to XML.
We first needed to capitalize every first letter of each line in order to run our XProc/ixml.
Episode CapitalizerConverting everything to XML all at once.
After our ixml was working on all of our files, we ran our XProc to convert everything to XML.
Our XProc