这是“文件格式探究”专题的第 1 期——初探 “ePub” 文件格式。
这是“文件格式探究”专题的第 1 期——初探 “ePub” 文件格式。这个专题将会给各位读者呈现笔者探索各种文件格式的过程,具体则是文件的内容是如何呈现出来的。原则上我们假定仅对于这些文件格式的用途有所了解,但具体实现的细节并不清楚 (如果提前掌握了部分内容,笔者全当其不存在) 。探究过程中我们会尝试使用各种方法来逐渐初步掌握其概貌。
文件格式简介
根据维基百科大陆简体版本的相关描述:
EPub 是一个自由的开放标准,属于一种可以“自动重新排版”的内容;也就是文字内容可以根据阅读设备的特性,以最适于阅读的方式显示。
之所以后面不截是因为再截就剧透了。简单来说,ePub 就是类似于 PDF 那样的“文档型”文件格式,常用于分发电子读物等。
探究过程
环境
现在笔者手头上有一份用于测试的 ePub 文件,文件路径为 ~/Downloads/咖啡馆推理事件簿系列(全四本).epub
(趁机夹带私货,反正很合我胃口就是了) ,后续所有的探究活动均建立于此文件上。笔者目前的操作系统环境为 Manjaro 21.1.0 on amd64,终端环境为 GNU bash 5.1.8(1)-release 。为了方便,我们先把文件改个名字 (那你还把原来的名字给出来干嘛?!) :
1 2 3 4 5
| [littleye233@lymjrolt Downloads]$ cd ~ [littleye233@lymjrolt ~]$ cd Downloads [littleye233@lymjrolt Downloads]$ mv 咖啡馆推理事件簿系列(全四本).epub test.epub [littleye233@lymjrolt Downloads]$ ll test.epub -rw-r--r-- 1 littleye233 littleye233 1253964 Aug 22 23:24 test.epub
|
Round I. 文件类型
首先我们先尝试用 Linux 系统的内置命令 file
试试水,看看会输出什么东西。键入 file test.epub
后执行:
1 2
| [littleye233@lymjrolt Downloads]$ file test.epub test.epub: EPUB document EPUB document
|
哎呀,真可惜! file
命令几乎什么有效信息都没给我们。 file
命令的 man
页面明确给出此命令可以判断文件格式,但其实它能做到的有很多,例如如果对一个图片文件使用 file
,可能会出现类似下面的结果:
1 2
| [littleye233@lymjrolt Downloads]$ file ~/.local/share/osu/screenshots/osu_2021-08-21_23-40-03.png /home/littleye233/.local/share/osu/screenshots/osu_2021-08-21_23-40-03.png: PNG image data, 1920 x 961, 8-bit/color RGBA, non-interlaced
|
这样我们可以通过 file
中提供的相关信息顺藤摸瓜,尝试在文件的二进制编码内容中寻找其蛛丝马迹,进而推测对应“位点”所表达的含义 (因为一些文件格式要求在特定的位置表达某些含义) ,如果能提供类似注释的信息就再好不过了。
Round II. 文件结构
现在我们回到这个 ePub 文件上来。现在我们尝试能否直接获取其内容,目的是通过文件头部的部分可见字符猜测其文件结构。输入 nano test.epub
直接预览,或使用 head --bytes=120 test.epub
查看前面 120 个字节的内容:
1 2 3
| [littleye233@lymjrolt Downloads]$ head --bytes=120 test.epub PK!oa�mimetypeapplication/epub+zipPU�N�;�ʯ�META-INF/container.xml]�A �0E�=
|
果不其然,我们看到了一些有趣的字眼: “mimetypeapplication/epub+zip” ,凭经验猜测,这应该是 ePub 文件格式的文件头,而其中的 “zip” 也说明—— ePub 文件可能本质上就是一个压缩档。
其实很多文件格式 (例如 Word 文档 “*.docx”) 其本质都是在一个压缩档中加入各种资源文件和配置文件,只要有对应的软件进行读取并重新加工,用户即能看到效果。
Round III. 目录树结构
现在我们可以使用解压缩程序解出 ePub 文件中的内容了。在终端中执行 unzip -l test.epub
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78
| [littleye233@lymjrolt Downloads]$ unzip -l test.epub Archive: test.epub Length Date Time Name --------- ---------- ----- ---- 20 1980-01-01 00:00 mimetype 251 2019-06-27 10:40 META-INF/container.xml 12307 2019-06-27 10:40 OEBPS/content.opf 112368 2019-06-27 10:40 OEBPS/Images/cover00464.jpeg 128680 2019-06-27 10:40 OEBPS/Images/image00456.jpeg 120936 2019-06-27 10:40 OEBPS/Images/image00457.jpeg 1392 2019-06-27 10:40 OEBPS/Images/image00458.jpeg 101948 2019-06-27 10:40 OEBPS/Images/image00459.jpeg 119124 2019-06-27 10:40 OEBPS/Images/image00460.jpeg 1268 2019-06-27 10:40 OEBPS/Images/image00461.jpeg 42944 2019-06-27 10:40 OEBPS/Images/image00462.jpeg 121284 2019-06-27 10:40 OEBPS/Images/image00463.jpeg 2251 2019-06-27 10:40 OEBPS/Styles/style0001.css 9816 2019-06-27 10:40 OEBPS/Styles/style0002.css 2251 2019-06-27 10:40 OEBPS/Styles/style0003.css 9789 2019-06-27 10:40 OEBPS/Styles/style0004.css 2251 2019-06-27 10:40 OEBPS/Styles/style0005.css 29245 2019-06-27 10:40 OEBPS/Styles/style0006.css 2235 2019-06-27 10:40 OEBPS/Styles/style0007.css 29914 2019-06-27 10:40 OEBPS/Styles/style0008.css 2251 2019-06-27 10:40 OEBPS/Styles/style0009.css 624 2019-06-27 10:40 OEBPS/Text/cover_page.xhtml 851 2019-06-27 10:40 OEBPS/Text/part0000.xhtml 561 2019-06-27 10:40 OEBPS/Text/part0001.xhtml 428 2019-06-27 10:40 OEBPS/Text/part0002.xhtml 1518 2019-06-27 10:40 OEBPS/Text/part0003.xhtml 661 2019-06-27 10:40 OEBPS/Text/part0004.xhtml 2311 2019-06-27 10:40 OEBPS/Text/part0005.xhtml 55157 2019-06-27 10:40 OEBPS/Text/part0006.xhtml 58266 2019-06-27 10:40 OEBPS/Text/part0007.xhtml 59953 2019-06-27 10:40 OEBPS/Text/part0008.xhtml 49789 2019-06-27 10:40 OEBPS/Text/part0009.xhtml 66870 2019-06-27 10:40 OEBPS/Text/part0010.xhtml 57342 2019-06-27 10:40 OEBPS/Text/part0011.xhtml 67449 2019-06-27 10:40 OEBPS/Text/part0012.xhtml 16183 2019-06-27 10:40 OEBPS/Text/part0013.xhtml 561 2019-06-27 10:40 OEBPS/Text/part0014.xhtml 428 2019-06-27 10:40 OEBPS/Text/part0015.xhtml 1575 2019-06-27 10:40 OEBPS/Text/part0016.xhtml 496 2019-06-27 10:40 OEBPS/Text/part0017.xhtml 1446 2019-06-27 10:40 OEBPS/Text/part0018.xhtml 52358 2019-06-27 10:40 OEBPS/Text/part0019.xhtml 75746 2019-06-27 10:40 OEBPS/Text/part0020.xhtml 63420 2019-06-27 10:40 OEBPS/Text/part0021.xhtml 57399 2019-06-27 10:40 OEBPS/Text/part0022.xhtml 58590 2019-06-27 10:40 OEBPS/Text/part0023.xhtml 40263 2019-06-27 10:40 OEBPS/Text/part0024.xhtml 66099 2019-06-27 10:40 OEBPS/Text/part0025.xhtml 15143 2019-06-27 10:40 OEBPS/Text/part0026.xhtml 561 2019-06-27 10:40 OEBPS/Text/part0027.xhtml 612 2019-06-27 10:40 OEBPS/Text/part0028.xhtml 1344 2019-06-27 10:40 OEBPS/Text/part0029.xhtml 640 2019-06-27 10:40 OEBPS/Text/part0030.xhtml 6144 2019-06-27 10:40 OEBPS/Text/part0031.xhtml 25197 2019-06-27 10:40 OEBPS/Text/part0032.xhtml 54594 2019-06-27 10:40 OEBPS/Text/part0033.xhtml 87394 2019-06-27 10:40 OEBPS/Text/part0034.xhtml 97557 2019-06-27 10:40 OEBPS/Text/part0035.xhtml 109901 2019-06-27 10:40 OEBPS/Text/part0036.xhtml 17181 2019-06-27 10:40 OEBPS/Text/part0037.xhtml 5238 2019-06-27 10:40 OEBPS/Text/part0038.xhtml 561 2019-06-27 10:40 OEBPS/Text/part0039.xhtml 644 2019-06-27 10:40 OEBPS/Text/part0040.xhtml 1163 2019-06-27 10:40 OEBPS/Text/part0041.xhtml 1473 2019-06-27 10:40 OEBPS/Text/part0042.xhtml 38427 2019-06-27 10:40 OEBPS/Text/part0043.xhtml 90589 2019-06-27 10:40 OEBPS/Text/part0044.xhtml 51278 2019-06-27 10:40 OEBPS/Text/part0045.xhtml 58321 2019-06-27 10:40 OEBPS/Text/part0046.xhtml 29670 2019-06-27 10:40 OEBPS/Text/part0047.xhtml 12903 2019-06-27 10:40 OEBPS/Text/part0048.xhtml 7364 2019-06-27 10:40 OEBPS/toc.ncx --------- ------- 2422768 72 files
|
同时可以直接解压:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
| [littleye233@lymjrolt Downloads]$ unzip test.epub -d test_epub Archive: test.epub extracting: test_epub/mimetype inflating: test_epub/META-INF/container.xml inflating: test_epub/OEBPS/content.opf inflating: test_epub/OEBPS/Images/cover00464.jpeg inflating: test_epub/OEBPS/Images/image00456.jpeg inflating: test_epub/OEBPS/Images/image00457.jpeg inflating: test_epub/OEBPS/Images/image00458.jpeg inflating: test_epub/OEBPS/Images/image00459.jpeg inflating: test_epub/OEBPS/Images/image00460.jpeg inflating: test_epub/OEBPS/Images/image00461.jpeg inflating: test_epub/OEBPS/Images/image00462.jpeg inflating: test_epub/OEBPS/Images/image00463.jpeg inflating: test_epub/OEBPS/Styles/style0001.css inflating: test_epub/OEBPS/Styles/style0002.css inflating: test_epub/OEBPS/Styles/style0003.css inflating: test_epub/OEBPS/Styles/style0004.css inflating: test_epub/OEBPS/Styles/style0005.css inflating: test_epub/OEBPS/Styles/style0006.css inflating: test_epub/OEBPS/Styles/style0007.css inflating: test_epub/OEBPS/Styles/style0008.css inflating: test_epub/OEBPS/Styles/style0009.css inflating: test_epub/OEBPS/Text/cover_page.xhtml inflating: test_epub/OEBPS/Text/part0000.xhtml inflating: test_epub/OEBPS/Text/part0001.xhtml inflating: test_epub/OEBPS/Text/part0002.xhtml inflating: test_epub/OEBPS/Text/part0003.xhtml inflating: test_epub/OEBPS/Text/part0004.xhtml inflating: test_epub/OEBPS/Text/part0005.xhtml inflating: test_epub/OEBPS/Text/part0006.xhtml inflating: test_epub/OEBPS/Text/part0007.xhtml inflating: test_epub/OEBPS/Text/part0008.xhtml inflating: test_epub/OEBPS/Text/part0009.xhtml inflating: test_epub/OEBPS/Text/part0010.xhtml inflating: test_epub/OEBPS/Text/part0011.xhtml inflating: test_epub/OEBPS/Text/part0012.xhtml inflating: test_epub/OEBPS/Text/part0013.xhtml inflating: test_epub/OEBPS/Text/part0014.xhtml inflating: test_epub/OEBPS/Text/part0015.xhtml inflating: test_epub/OEBPS/Text/part0016.xhtml inflating: test_epub/OEBPS/Text/part0017.xhtml inflating: test_epub/OEBPS/Text/part0018.xhtml inflating: test_epub/OEBPS/Text/part0019.xhtml inflating: test_epub/OEBPS/Text/part0020.xhtml inflating: test_epub/OEBPS/Text/part0021.xhtml inflating: test_epub/OEBPS/Text/part0022.xhtml inflating: test_epub/OEBPS/Text/part0023.xhtml inflating: test_epub/OEBPS/Text/part0024.xhtml inflating: test_epub/OEBPS/Text/part0025.xhtml inflating: test_epub/OEBPS/Text/part0026.xhtml inflating: test_epub/OEBPS/Text/part0027.xhtml inflating: test_epub/OEBPS/Text/part0028.xhtml inflating: test_epub/OEBPS/Text/part0029.xhtml inflating: test_epub/OEBPS/Text/part0030.xhtml inflating: test_epub/OEBPS/Text/part0031.xhtml inflating: test_epub/OEBPS/Text/part0032.xhtml inflating: test_epub/OEBPS/Text/part0033.xhtml inflating: test_epub/OEBPS/Text/part0034.xhtml inflating: test_epub/OEBPS/Text/part0035.xhtml inflating: test_epub/OEBPS/Text/part0036.xhtml inflating: test_epub/OEBPS/Text/part0037.xhtml inflating: test_epub/OEBPS/Text/part0038.xhtml inflating: test_epub/OEBPS/Text/part0039.xhtml inflating: test_epub/OEBPS/Text/part0040.xhtml inflating: test_epub/OEBPS/Text/part0041.xhtml inflating: test_epub/OEBPS/Text/part0042.xhtml inflating: test_epub/OEBPS/Text/part0043.xhtml inflating: test_epub/OEBPS/Text/part0044.xhtml inflating: test_epub/OEBPS/Text/part0045.xhtml inflating: test_epub/OEBPS/Text/part0046.xhtml inflating: test_epub/OEBPS/Text/part0047.xhtml inflating: test_epub/OEBPS/Text/part0048.xhtml inflating: test_epub/OEBPS/toc.ncx
|
为了更清楚地显示文件树结构,我们也可以使用 tree
命令 (这个命令在 Windows 中是内置的,在 Linux 中需要安装 tree
这个包,使用软件包管理器或编译安装均可) :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
| [littleye233@lymjrolt test_epub]$ tree . ├── META-INF │ └── container.xml ├── mimetype └── OEBPS ├── content.opf ├── Images │ ├── cover00464.jpeg │ ├── image00456.jpeg │ ├── image00457.jpeg │ ├── image00458.jpeg │ ├── image00459.jpeg │ ├── image00460.jpeg │ ├── image00461.jpeg │ ├── image00462.jpeg │ └── image00463.jpeg ├── Styles │ ├── style0001.css │ ├── style0002.css │ ├── style0003.css │ ├── style0004.css │ ├── style0005.css │ ├── style0006.css │ ├── style0007.css │ ├── style0008.css │ └── style0009.css ├── Text │ ├── cover_page.xhtml │ ├── part0000.xhtml │ ├── part0001.xhtml │ ├── part0002.xhtml │ ├── part0003.xhtml │ ├── part0004.xhtml │ ├── part0005.xhtml │ ├── part0006.xhtml │ ├── part0007.xhtml │ ├── part0008.xhtml │ ├── part0009.xhtml │ ├── part0010.xhtml │ ├── part0011.xhtml │ ├── part0012.xhtml │ ├── part0013.xhtml │ ├── part0014.xhtml │ ├── part0015.xhtml │ ├── part0016.xhtml │ ├── part0017.xhtml │ ├── part0018.xhtml │ ├── part0019.xhtml │ ├── part0020.xhtml │ ├── part0021.xhtml │ ├── part0022.xhtml │ ├── part0023.xhtml │ ├── part0024.xhtml │ ├── part0025.xhtml │ ├── part0026.xhtml │ ├── part0027.xhtml │ ├── part0028.xhtml │ ├── part0029.xhtml │ ├── part0030.xhtml │ ├── part0031.xhtml │ ├── part0032.xhtml │ ├── part0033.xhtml │ ├── part0034.xhtml │ ├── part0035.xhtml │ ├── part0036.xhtml │ ├── part0037.xhtml │ ├── part0038.xhtml │ ├── part0039.xhtml │ ├── part0040.xhtml │ ├── part0041.xhtml │ ├── part0042.xhtml │ ├── part0043.xhtml │ ├── part0044.xhtml │ ├── part0045.xhtml │ ├── part0046.xhtml │ ├── part0047.xhtml │ └── part0048.xhtml └── toc.ncx
5 directories, 72 files
|
Round IV. 内部文件
到这里我们大概就能猜出来:
META-INF
文件夹:里面存放的应该是“容器” (也就是这个 ePub 文件) 的相关配置文件;
mimetype
文件:里面定义了这个文件的类型为 “ePub” (其中 “MIME” 是 “Multipurpose Internet Mail Extensions” 的缩写,从字面上也能看出其具有指示 “Extension” 的机能) ;
OEBPS
文件夹:虽暂不知其确切含义,但应存放 ePub 的文字、图片以及其他的界面数据;
content.opf
文件:里面存放的应该是目录信息——或是定义各种文件的“次序”;
Images
Styles
和 Text
文件夹:明显分别存放图片、层叠样式表和文字数据;
toc.ncx
文件:可能是真正的目录 (“toc” 是 “table of contents” 的缩写)。
接下来我们将挨个分析。
Round IV.I. 容器
先看 META-INF/container.xml
:
1 2
| [littleye233@lymjrolt test_epub]$ file META-INF/container.xml META-INF/container.xml: XML 1.0 document, ASCII text
|
输出其内容:
1 2 3 4 5
| <?xml version="1.0" encoding="UTF-8"?> <container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container"> <rootfiles> <rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/> </rootfiles> </container>
|
显然是一个标准的 XML 文件,其中我们可以注意到 /container/rootfiles/rootfile[@class='full-path']
[^1] 中定义了一个我们之前认定的目录文件,但此处可以规范化,故这个文件在大多数 ePub 档中应该是相同的。
Round IV.II. 文件类型定性
接下来看 mimetype
文件:
1 2
| [littleye233@lymjrolt test_epub]$ cat mimetype application/epub+zip
|
这也是相当显然的,也不再赘述。
Round IV.III. 目录?
再看 OEBPS/content.opf
:
1 2
| [littleye233@lymjrolt test_epub]$ file OEBPS/content.opf OEBPS/content.opf: XML 1.0 document, Unicode text, UTF-8 text, with very long lines (504)
|
这也是一个 XML 文件,令人惊讶的是 file
命令竟能看出这个文件中最长的行有 504 个字符,属实让人害怕。
点此查看 `OEBPS/content.opf` 的全部内容 (已经过格式化)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186
| <?xml version="1.0" encoding="utf-8"?> <package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="uid"> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf"> <dc:title opf:file-as="kafeiguantuilishijianbuxilie(quansiben)">咖啡馆推理事件簿系列(全四本)</dc:title> <dc:language>zh</dc:language> <dc:identifier id="uid">3899198450</dc:identifier> <dc:creator opf:file-as="(ri)gangqizuomo">(日)冈崎琢磨</dc:creator> <dc:date opf:event="publication">2018-03-15</dc:date>
<meta name="cover" content="x_cover-image"/> <meta name="output encoding" content="utf-8"/> <meta name="primary-writing-mode" content="horizontal-lr"/>
</metadata> <manifest> <item id="x_cover" media-type="application/xhtml+xml" href="Text/cover_page.xhtml"/> <item id="x_TableOfContents" media-type="application/xhtml+xml" href="Text/part0000.xhtml"/> <item id="x_a1cover.html" media-type="application/xhtml+xml" href="Text/part0001.xhtml"/> <item id="x_a1bookname" media-type="application/xhtml+xml" href="Text/part0002.xhtml"/> <item id="x_a1TableOfContents" media-type="application/xhtml+xml" href="Text/part0003.xhtml"/> <item id="x_a1Chapter001" media-type="application/xhtml+xml" href="Text/part0004.xhtml"/> <item id="x_a1Chapter002" media-type="application/xhtml+xml" href="Text/part0005.xhtml"/> <item id="x_a1Chapter003" media-type="application/xhtml+xml" href="Text/part0006.xhtml"/> <item id="x_a1Chapter004" media-type="application/xhtml+xml" href="Text/part0007.xhtml"/> <item id="x_a1Chapter005" media-type="application/xhtml+xml" href="Text/part0008.xhtml"/> <item id="x_a1Chapter006" media-type="application/xhtml+xml" href="Text/part0009.xhtml"/> <item id="x_a1Chapter007" media-type="application/xhtml+xml" href="Text/part0010.xhtml"/> <item id="x_a1Chapter008" media-type="application/xhtml+xml" href="Text/part0011.xhtml"/> <item id="x_a1Chapter009" media-type="application/xhtml+xml" href="Text/part0012.xhtml"/> <item id="x_a1Chapter010" media-type="application/xhtml+xml" href="Text/part0013.xhtml"/> <item id="x_a2cover.html" media-type="application/xhtml+xml" href="Text/part0014.xhtml"/> <item id="x_a2bookname" media-type="application/xhtml+xml" href="Text/part0015.xhtml"/> <item id="x_a2TableOfContents" media-type="application/xhtml+xml" href="Text/part0016.xhtml"/> <item id="x_a2Chapter001" media-type="application/xhtml+xml" href="Text/part0017.xhtml"/> <item id="x_a2Chapter002" media-type="application/xhtml+xml" href="Text/part0018.xhtml"/> <item id="x_a2Chapter003" media-type="application/xhtml+xml" href="Text/part0019.xhtml"/> <item id="x_a2Chapter004" media-type="application/xhtml+xml" href="Text/part0020.xhtml"/> <item id="x_a2Chapter005" media-type="application/xhtml+xml" href="Text/part0021.xhtml"/> <item id="x_a2Chapter006" media-type="application/xhtml+xml" href="Text/part0022.xhtml"/> <item id="x_a2Chapter007" media-type="application/xhtml+xml" href="Text/part0023.xhtml"/> <item id="x_a2Chapter008" media-type="application/xhtml+xml" href="Text/part0024.xhtml"/> <item id="x_a2Chapter009" media-type="application/xhtml+xml" href="Text/part0025.xhtml"/> <item id="x_a2Chapter010" media-type="application/xhtml+xml" href="Text/part0026.xhtml"/> <item id="x_a3cover.html" media-type="application/xhtml+xml" href="Text/part0027.xhtml"/> <item id="x_a3bookname" media-type="application/xhtml+xml" href="Text/part0028.xhtml"/> <item id="x_a3TableOfContents" media-type="application/xhtml+xml" href="Text/part0029.xhtml"/> <item id="x_a3Chapter001" media-type="application/xhtml+xml" href="Text/part0030.xhtml"/> <item id="x_a3Chapter002" media-type="application/xhtml+xml" href="Text/part0031.xhtml"/> <item id="x_a3Chapter003" media-type="application/xhtml+xml" href="Text/part0032.xhtml"/> <item id="x_a3Chapter004" media-type="application/xhtml+xml" href="Text/part0033.xhtml"/> <item id="x_a3Chapter005" media-type="application/xhtml+xml" href="Text/part0034.xhtml"/> <item id="x_a3Chapter006" media-type="application/xhtml+xml" href="Text/part0035.xhtml"/> <item id="x_a3Chapter007" media-type="application/xhtml+xml" href="Text/part0036.xhtml"/> <item id="x_a3Chapter008" media-type="application/xhtml+xml" href="Text/part0037.xhtml"/> <item id="x_a3Chapter009" media-type="application/xhtml+xml" href="Text/part0038.xhtml"/> <item id="x_a4cover.html" media-type="application/xhtml+xml" href="Text/part0039.xhtml"/> <item id="x_a4bookname" media-type="application/xhtml+xml" href="Text/part0040.xhtml"/> <item id="x_a4TableOfContents" media-type="application/xhtml+xml" href="Text/part0041.xhtml"/> <item id="x_a4Chapter001" media-type="application/xhtml+xml" href="Text/part0042.xhtml"/> <item id="x_a4Chapter002" media-type="application/xhtml+xml" href="Text/part0043.xhtml"/> <item id="x_a4Chapter003" media-type="application/xhtml+xml" href="Text/part0044.xhtml"/> <item id="x_a4Chapter004" media-type="application/xhtml+xml" href="Text/part0045.xhtml"/> <item id="x_a4Chapter005" media-type="application/xhtml+xml" href="Text/part0046.xhtml"/> <item id="x_a4Chapter006" media-type="application/xhtml+xml" href="Text/part0047.xhtml"/> <item id="x_a4Chapter007" media-type="application/xhtml+xml" href="Text/part0048.xhtml"/> <item id="item50" media-type="text/css" href="Styles/style0001.css"/> <item id="item51" media-type="text/css" href="Styles/style0002.css"/> <item id="item52" media-type="text/css" href="Styles/style0003.css"/> <item id="item53" media-type="text/css" href="Styles/style0004.css"/> <item id="item54" media-type="text/css" href="Styles/style0005.css"/> <item id="item55" media-type="text/css" href="Styles/style0006.css"/> <item id="item56" media-type="text/css" href="Styles/style0007.css"/> <item id="item57" media-type="text/css" href="Styles/style0008.css"/> <item id="item58" media-type="text/css" href="Styles/style0009.css"/> <item id="item59" media-type="image/jpeg" href="Images/image00456.jpeg"/> <item id="item60" media-type="image/jpeg" href="Images/image00457.jpeg"/> <item id="item61" media-type="image/jpeg" href="Images/image00458.jpeg"/> <item id="item62" media-type="image/jpeg" href="Images/image00459.jpeg"/> <item id="item63" media-type="image/jpeg" href="Images/image00460.jpeg"/> <item id="item64" media-type="image/jpeg" href="Images/image00461.jpeg"/> <item id="item65" media-type="image/jpeg" href="Images/image00462.jpeg"/> <item id="item66" media-type="image/jpeg" href="Images/image00463.jpeg"/> <item id="x_cover-image" media-type="image/jpeg" href="Images/cover00464.jpeg"/> <item id="ncx" media-type="application/x-dtbncx+xml" href="toc.ncx"/> </manifest> <spine toc="ncx"> <itemref idref="x_cover" linear="no"/> <itemref idref="x_TableOfContents" linear="yes"/> <itemref idref="x_a1cover.html" linear="yes"/> <itemref idref="x_a1bookname" linear="yes"/> <itemref idref="x_a1TableOfContents" linear="yes"/> <itemref idref="x_a1Chapter001" linear="yes"/> <itemref idref="x_a1Chapter002" linear="yes"/> <itemref idref="x_a1Chapter003" linear="yes"/> <itemref idref="x_a1Chapter004" linear="yes"/> <itemref idref="x_a1Chapter005" linear="yes"/> <itemref idref="x_a1Chapter006" linear="yes"/> <itemref idref="x_a1Chapter007" linear="yes"/> <itemref idref="x_a1Chapter008" linear="yes"/> <itemref idref="x_a1Chapter009" linear="yes"/> <itemref idref="x_a1Chapter010" linear="yes"/> <itemref idref="x_a2cover.html" linear="yes"/> <itemref idref="x_a2bookname" linear="yes"/> <itemref idref="x_a2TableOfContents" linear="yes"/> <itemref idref="x_a2Chapter001" linear="yes"/> <itemref idref="x_a2Chapter002" linear="yes"/> <itemref idref="x_a2Chapter003" linear="yes"/> <itemref idref="x_a2Chapter004" linear="yes"/> <itemref idref="x_a2Chapter005" linear="yes"/> <itemref idref="x_a2Chapter006" linear="yes"/> <itemref idref="x_a2Chapter007" linear="yes"/> <itemref idref="x_a2Chapter008" linear="yes"/> <itemref idref="x_a2Chapter009" linear="yes"/> <itemref idref="x_a2Chapter010" linear="yes"/> <itemref idref="x_a3cover.html" linear="yes"/> <itemref idref="x_a3bookname" linear="yes"/> <itemref idref="x_a3TableOfContents" linear="yes"/> <itemref idref="x_a3Chapter001" linear="yes"/> <itemref idref="x_a3Chapter002" linear="yes"/> <itemref idref="x_a3Chapter003" linear="yes"/> <itemref idref="x_a3Chapter004" linear="yes"/> <itemref idref="x_a3Chapter005" linear="yes"/> <itemref idref="x_a3Chapter006" linear="yes"/> <itemref idref="x_a3Chapter007" linear="yes"/> <itemref idref="x_a3Chapter008" linear="yes"/> <itemref idref="x_a3Chapter009" linear="yes"/> <itemref idref="x_a4cover.html" linear="yes"/> <itemref idref="x_a4bookname" linear="yes"/> <itemref idref="x_a4TableOfContents" linear="yes"/> <itemref idref="x_a4Chapter001" linear="yes"/> <itemref idref="x_a4Chapter002" linear="yes"/> <itemref idref="x_a4Chapter003" linear="yes"/> <itemref idref="x_a4Chapter004" linear="yes"/> <itemref idref="x_a4Chapter005" linear="yes"/> <itemref idref="x_a4Chapter006" linear="yes"/> <itemref idref="x_a4Chapter007" linear="yes"/> </spine> <tours> </tours> <guide> <reference type="text" title="Start" href="Text/part0004.xhtml"/> <reference type="toc" title="Table of Contents" href="Text/part0000.xhtml"/> <reference type="cover" title="Cover" href="Text/cover_page.xhtml"/> </guide> </package>
|
说明我之前并没有猜错,这个文件存放的是超越“目录”的东西,而是“次序”——更进一步说。是“索引”。这个文件类似于其他文件格式或目录树中的 index.*
,将 ePub 中的各种数据编上号码,同时这里也定义了标题、语言、作者、出版 (发布) 日期等元信息。至于之前看到的超长行,似乎是一种十六进制的水印 (watermark) ,或许是为了防侵权等。
其中的 /package/manifest/item
定义了所有的索引,以及文件对应的类型; /package/spine/itemref
暂不知进一步的作用,但从中可看出能定义是否“线性” (linear) ; /package/guide/reference
定义了 ePub 的封面等索引,可供文件管理器和 ePub 阅读器使用 (显示预览页) 。
Round IV.IV. 目录!
再看 OEBPS/toc.ncx
:
1 2
| [littleye233@lymjrolt test_epub]$ file OEBPS/toc.ncx OEBPS/toc.ncx: XML 1.0 document, Unicode text, UTF-8 text
|
感觉再讨论文件类型已经无关紧要了。再次查看内容:
点此查看 `OEBPS/toc.ncx` 的全部内容 (已经过格式化)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237
| <?xml version="1.0" encoding="utf-8"?> <ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1" xml:lang="zh"> <head> <meta content="3899198450" name="dtb:uid"/> <meta content="2" name="dtb:depth"/> <meta content="mobiunpack.py" name="dtb:generator"/> <meta content="0" name="dtb:totalPageCount"/> <meta content="0" name="dtb:maxPageNumber"/> </head> <docTitle> <text>咖啡馆推理事件簿系列(全四本)</text> </docTitle> <navMap> <navPoint id="np_1" playOrder="1"> <navLabel> <text>总目录</text> </navLabel> <content src="Text/part0000.xhtml"/> </navPoint> <navPoint id="np_2" playOrder="2"> <navLabel> <text>咖啡馆推理事件簿:下次见面时,请让我品尝你煮的咖啡</text> </navLabel> <content src="Text/part0001.xhtml"/> <navPoint id="np_3" playOrder="3"> <navLabel> <text>序章</text> </navLabel> <content src="Text/part0005.xhtml"/> </navPoint> <navPoint id="np_4" playOrder="4"> <navLabel> <text>一 事件始于第二次光顾</text> </navLabel> <content src="Text/part0006.xhtml"/> </navPoint> <navPoint id="np_5" playOrder="5"> <navLabel> <text>二 Bittersweet Black</text> </navLabel> <content src="Text/part0007.xhtml"/> </navPoint> <navPoint id="np_6" playOrder="6"> <navLabel> <text>三 隐藏在乳白色中的心</text> </navLabel> <content src="Text/part0008.xhtml"/> </navPoint> <navPoint id="np_7" playOrder="7"> <navLabel> <text>四 棋盘上的狩猎</text> </navLabel> <content src="Text/part0009.xhtml"/> </navPoint> <navPoint id="np_8" playOrder="8"> <navLabel> <text>五 past,present,f******?</text> </navLabel> <content src="Text/part0010.xhtml"/> </navPoint> <navPoint id="np_9" playOrder="9"> <navLabel> <text>六 Animals in the closed room</text> </navLabel> <content src="Text/part0011.xhtml"/> </navPoint> <navPoint id="np_10" playOrder="10"> <navLabel> <text>七 下次见面时,请让我品尝你煮的咖啡</text> </navLabel> <content src="Text/part0012.xhtml"/> </navPoint> <navPoint id="np_11" playOrder="11"> <navLabel> <text>终章</text> </navLabel> <content src="Text/part0013.xhtml"/> </navPoint> </navPoint> <navPoint id="np_12" playOrder="12"> <navLabel> <text>咖啡馆推理事件簿2:她梦到了欧蕾咖啡</text> </navLabel> <content src="Text/part0014.xhtml"/> <navPoint id="np_13" playOrder="13"> <navLabel> <text>序曲 她的梦</text> </navLabel> <content src="Text/part0018.xhtml"/> </navPoint> <navPoint id="np_14" playOrder="14"> <navLabel> <text>第一章 敬启致未来的你</text> </navLabel> <content src="Text/part0019.xhtml"/> </navPoint> <navPoint id="np_15" playOrder="15"> <navLabel> <text>第二章 狐狸的迷惑</text> </navLabel> <content src="Text/part0020.xhtml"/> </navPoint> <navPoint id="np_16" playOrder="16"> <navLabel> <text>第三章 打碎乳白色的心</text> </navLabel> <content src="Text/part0021.xhtml"/> </navPoint> <navPoint id="np_17" playOrder="17"> <navLabel> <text>第四章 咖啡侦探蕾拉事件簿</text> </navLabel> <content src="Text/part0022.xhtml"/> </navPoint> <navPoint id="np_18" playOrder="18"> <navLabel> <text>第五章 (She Wanted To Be)WANTED</text> </navLabel> <content src="Text/part0023.xhtml"/> </navPoint> <navPoint id="np_19" playOrder="19"> <navLabel> <text>第六章 the Sky Occluded in the Sun</text> </navLabel> <content src="Text/part0024.xhtml"/> </navPoint> <navPoint id="np_20" playOrder="20"> <navLabel> <text>第七章 在星空之下同命相连</text> </navLabel> <content src="Text/part0025.xhtml"/> </navPoint> <navPoint id="np_21" playOrder="21"> <navLabel> <text>终章 她梦到了欧蕾咖啡</text> </navLabel> <content src="Text/part0026.xhtml"/> </navPoint> </navPoint> <navPoint id="np_22" playOrder="22"> <navLabel> <text>咖啡馆推理事件簿3:扰人心神的咖啡</text> </navLabel> <content src="Text/part0027.xhtml"/> <navPoint id="np_23" playOrder="23"> <navLabel> <text>序曲 五年前</text> </navLabel> <content src="Text/part0031.xhtml"/> </navPoint> <navPoint id="np_24" playOrder="24"> <navLabel> <text>第一章 参加大赛</text> </navLabel> <content src="Text/part0032.xhtml"/> </navPoint> <navPoint id="np_25" playOrder="25"> <navLabel> <text>第二章 前夜</text> </navLabel> <content src="Text/part0033.xhtml"/> </navPoint> <navPoint id="np_26" playOrder="26"> <navLabel> <text>第三章 第一天</text> </navLabel> <content src="Text/part0034.xhtml"/> </navPoint> <navPoint id="np_27" playOrder="27"> <navLabel> <text>第四章 第二天</text> </navLabel> <content src="Text/part0035.xhtml"/> </navPoint> <navPoint id="np_28" playOrder="28"> <navLabel> <text>第五章 真相</text> </navLabel> <content src="Text/part0036.xhtml"/> </navPoint> <navPoint id="np_29" playOrder="29"> <navLabel> <text>第六章 日后</text> </navLabel> <content src="Text/part0037.xhtml"/> </navPoint> <navPoint id="np_30" playOrder="30"> <navLabel> <text>尾声 五年前</text> </navLabel> <content src="Text/part0038.xhtml"/> </navPoint> </navPoint> <navPoint id="np_31" playOrder="31"> <navLabel> <text>咖啡馆推理事件簿4:休闲时光的五种风味</text> </navLabel> <content src="Text/part0039.xhtml"/> <navPoint id="np_32" playOrder="32"> <navLabel> <text>午后三点前的无聊风景</text> </navLabel> <content src="Text/part0043.xhtml"/> </navPoint> <navPoint id="np_33" playOrder="33"> <navLabel> <text>帕列塔之恋</text> </navLabel> <content src="Text/part0044.xhtml"/> </navPoint> <navPoint id="np_34" playOrder="34"> <navLabel> <text>消失的礼物飞镖</text> </navLabel> <content src="Text/part0045.xhtml"/> </navPoint> <navPoint id="np_35" playOrder="35"> <navLabel> <text>可视化的原生艺术</text> </navLabel> <content src="Text/part0046.xhtml"/> </navPoint> <navPoint id="np_36" playOrder="36"> <navLabel> <text>在塔列兰咖啡馆的庭院里</text> </navLabel> <content src="Text/part0047.xhtml"/> </navPoint> <navPoint id="np_37" playOrder="37"> <navLabel> <text>特别篇 如释重负</text> </navLabel> <content src="Text/part0048.xhtml"/> </navPoint> </navPoint> </navMap> </ncx>
|
我们不妨将目光转向较为重要的“目录”的定义上。为了方便观察,笔者偷点懒,使用桌面环境中自带的阅读器观察:
从中可以看出目录是二层结构,恰好和 OEBPS/toc.ncx
中的定义保持一致。而其中的部分重要属性均可“望文生义”,此处不再进一步研究。
Round IV.V. 其余部分
最后剩下的是图片、文字和层叠样式表。虽然这部分是在整个 ePub 文件中占比最大也可以说是最重要的部分,但由于这一块的内容实在是太过直白,再讲下去恐怕要开始补习 HTML 和 CSS 知识了,故同样略去。
总结
根据上文中的简要探究, ePub 是一种以 XML 文件格式为配置文件类型的、包含有图片及文字等数据的、以压缩档为本质的文件格式。查阅相关资料后可知其实质与上文中分析类似。
而通过上文的分析,我们初步体验到分析一种陌生文件格式的规律和技巧,可以用于后续对更复杂的文件格式的探究。
但最后,别忘了把那个 ePub 文件的名字改回来 XD :
1
| [littleye233@lymjrolt Downloads]$ mv test.epub 咖啡馆推理事件簿系列(全四本).epub
|
【完】
脚注
[^1]: 此处为 XPath 语法,用于描述类 XML 文件各种元素的位置,后文类似者不再注明。