[1] Hoyt Long, and Richard Jean So, “LiteraryPattern Recognition: Modernism between Close Reading and Machine Learning,”in Critical Inquiry, 42:2 (2016), pp.235-267. The University ofChicago Press. Translated and reprinted with permission of The University ofChicago Press.
[2]Franco Moretti, Distant Reading, London,2013; Matthew L. Jockers, Macroanalysis: Digital Methods andLiterary History, Urbana, Ill., 2013; Matthew Wilkens, “The GeographicImagination of Civil War-Era American Fiction,” in American LiteraryHistory 25 (Winter 2013), pp. 803-40; and Andrew Piper and MarkAlgee-Hewitt, “The Werther Effect I: Goethe, Objecthoods, and the Handling ofKnowledge,” in Matt Erlin and Lynn Tatlock (eds), Distant Readings:Topologies of German Culture in the Long Nineteenth Century,Rochester, N.Y., 2014, pp. 155-84.
[3]有关加洛韦、科伦比亚与麦克弗森的近期批评,参见期刊Differences关于“在数字人文的阴影下”主题的特别话题。他们的论文包括:AlexanderGalloway, “The Cybernetic Hypothesis,” in Differences 25,No.1 (2014), pp.107-31; David Golumbia, “Death of a Discipline,” in Differences 25,No. 1 (2014), pp.156-76; 与 Tara McPherson, “Designingfor Difference,” in Differences 25, No. 1, 2014, pp.177-88. 同时参见Alan Liu,“Where is Cultural Criticism in the Digital Humanities?” in Debates inthe Digital Humanities, Matthew K. Gold (ed), Minneapolis, 2012, pp.490-509.
[4]TedUnderwood, “Theorizing Research Practices We Forgot to Theorize Twenty YearsAgo,” in Representations 127 (Summer 2014), p.65.
[5]“黑箱”是理工类程序中的常见概念,指某程序的机制无法被人完全掌控或观测,只能知道输入和输出的结果——译注。
[6]参见这一团队在斯坦福文学实验室印发的一系列出色的手册,Pamphlets,.
[7]Liu,“Where is Cultural Criticism in the Digital Humanities?” p. 501.
[8]Golumbia,“Death of a Discipline,” p.172.
[9]结语部分将阐明我们运用这一术语的准确意义。
[10]Ezra Pound, “April,” in Personae:Collected Shorter Poems of Ezra Pound, London, 1952, p. 101.
[11]参见EarlMiner, The Japanese Tradition in British and American Literature,Princeton, N. J., 1958, p.125; 下文简称JT.
[12]William Carlos William,“Marriage,” in The Collected Poems of William Carlos William, A.Walton Litz and Christopher MacGowan (eds), 2 vols. (New York, 1986-1988),1:56.
[13]Charles Aliteri, TheArt of Twentieth-Century American Poetry: Modernism and After, MaldenMass., 2006, p.41.
[14]同上,p. 23.
[15]Helen Vendler, WallaceStevens: Words chosen out of Desire, Knoxville, Tenn., 1984, p. 8.
[16]参见MarjoriePerloff, The Dance of the Interllect: Studies in the Poetry of the PoundTradition, Evanston, Ill., 1996.
[17]Peter Nicholls, “The Poetics ofModernism,” in The Cambridge Companion to Modernist Poetry, AlexDavis and Lee M. Jenkins (eds.), New York, 2007, p. 61.
[18]同上,p. 6、p. 61.
[19]Jeffrey Johnson, HaikuPoetics in Twentieth Century Avant-Garde Poetry, Lanham, Md., 2011, p. 69、p. 68.
[20]Vendler, WallaceStevens, p. 57.
[21]Laura Riding and RobertGraves, A Survey of Modernist Poetry (London, 1927), p. 216、p. 217、p.218.
[22]同上,p. 217.
[23]Douglas Mao and Rebecca L.Walkowitz, “The New Modernist Studies,” PMLA 123 (May 2008),p.737.
[24]如参见LawrenceRainey, Institutions of Modernism: Literary Elites and Public Culture,New Haven, Conn., 1999, 以及Andrew Goldstone, Fictionsof Autonomy: Modernism from Wilde to de Man, New York, 2013.关于体制环境,还可参见Mark Wollaeger, Modernism, Media, and Propaganda:British Narrative from 1900 to 1945, Princeton, N.J., 2008和Mark Goble, Beautiful Circuits: Modernism and theMediated Life, New York, 2010中关于现代主义与现代媒体形式的关系。
[25]上述学者英文名分别为ChristopherBush, Robert Kern, Eric Hayot, Steven Yao.
[26]参见ChristopherBush, “Modernism, Orientalism, and East Asia,” in A Handbook ofModernism Studies, ed. Jean-Michel Rabaté, Malden, Mass., 2013, pp.193-208;Robert Kern, Orientalism, Modernism, and the American Poem, NewYork, 1996; Eric Hayot, Chinese Dreams: Pound, Brecht, Tel Quel, AnnArbor, Mich., 2004; Steven G. Yao, Translation and the Languages ofModernism: Gender, Politics, Language, New York, 2002; and ZhaomingQian, Orientalism and Modernism: The Legacy of China in Pound andWilliams, Durham, N.C.,1995.
[27]Kern, Orientalism,Modernism, and the American Poem, p.175.
[28]当时更常用“hokku”和“haikai”两词来指称这一文体。两个词虽然与“haiku”同义,但严格说来它们仍有区别。“Hokku”指具有五七五音节的开放序列,在历史上它是长得多的系列相连诗歌。“Haikai”则专指这种相连诗歌的特定传统,它可以追溯到17世纪早期。“Haiku”则是诗人正冈子规在19世纪90年代新造的词,用以将这些诗歌分离出来作为各自独立的诗歌单元。
[29]W. G. Aston, A Historyof Japanese Literature, New York, 1899, p. 294.[30]Basil Hall Chamberlain, “Bashôand the Japanese Poetical Epigram,” in Transactions of the AsiaticSociety of Japan 30, no. 2 (1902), p.243.
[31]Aston, A History ofJapanese Literature, p. 294.
[32]Chamberlain, “Bashô and theJapanese Poetical Epigram,” p. 245、p. 305.
[33]Lafcadio Hearn, InGhostly Japan, Boston, 1899, p.154.
[34]例如,有些人运用俳句译作的语言(尤其是像庙钟、小花、盘旋的昆虫等短语)来描述俳句带给读者的理想效果;参见同上,以及Chamberlain, “Bashô and theJapanese Poetical Epigram,” p. 309. 与之类似,保罗-路易·库苏1906年在一篇有影响的文章中写道,一首俳句的意义“像屏风背后的竖琴之声或穿过雾霭而来的梨花香气那样”向我们飘来。(Paul-LouisCouchoud, “The Lyric Epigrams of Japan,” in Japanese Impressions:With a Note on Confucius, Frances Rumsey(trans), London,1921, p.38.
[35]F. S. Flint, “The History ofImagism,” in The Egoist 2 (May 1915), p. 71. 诗人把短歌和俳谐与自由体诗联系在一起,体现出他并不知晓这些形式的音节结构在创作实践中有多么严苛。这也暗示了模糊两者区别的广泛倾向,我们会在下个部分中考察这一论点。
[36]一位批评家甚至声称“日本発句诗歌无疑就是组成首部意象主义文选的参照模本,尤其是其中庞德先生的贡献”。(GeorgeLane, “Some Imagist Poets,” in The Little Review 2, May,1915, p. 27)
[37]洛威尔力图在她的改编诗歌中“保持発句的简洁与暗示,并将它维持在自然的空间中”(引自JT, p. 165)。弗莱彻欣赏俳句对“源自自然事物的普世情感”的运用,以及它“用最少的词语”来表达这种情感”(引自JT, p. 177)。
[38]Yone Noguchi, “What is a HokkuPoem?” in Rhythm 2, Jan.1913, p. 355.
[39]Noguchi, The Spirit ofJapanese Poetry, London, 1914, pp. 42–43、p. 51. 关于野口对庞德等早期接受者的影响,参见Edward Marx, “A Slightly Open Door:Yone Noguchi and the Invention of English Haiku,” in Genre 39(Fall 2006), pp. 107–26.
[40]Johnson, Haiku Poeticsin Twentieth Century Avant-Garde Poetry, p. 45.
[41]Royall Snow, “Marriage with theEast,” in The New Republic, 29 (June 1921), p.138. 另参阅Torao Taketomo, “American Imitations of Japanese Poetry,”in The Nation, 17 Jan. 1920, p. 70.
[42]Marjorie Allen Seiffert, “TheFloating World,” review of Pictures of the Floating World byAmy Lowell, in Poetry 15 (Mar. 1920), p. 334.
[43]John Livingston Lowes, Conventionand Revolt in Poetry, Boston, 1919, p. 166、p. 309.
[44]“Hoch der Hokku!” in Siren (Sept.1921),p. 10.
[45]Taketomo, “American Imitationsof Japanese Poetry,” p. 71.
[46]Snow, “Marriage with the East,”p.138. 斯诺援引了庞德914年的一篇文章,其中写道,“我们在接下来的世纪里躲不开……东方思想的强烈敏锐感和凝练的东方文学形式给我们的既定标准造成的越来越大的改变。”艾米·洛威尔也被提及,特别是她这句“暗示是我们从东方学到的重要的东西之一”(p. 138)。
[47]杰伊·哈贝尔和约翰·比蒂将俳句视为“亚洲诗歌对当代诗歌产生的巨大且还在日益增大的影响”的一部分,“[这一影响]带来了更大的简洁性和润色度”(Jay Hubbell and John O. Beaty, An Introductionto Poetry, New York,1922, p. 360).
[48]Jun Fujita, “A JapaneseCosmopolite,” review of Seen and Unseen: Or Monologues of a HmelessSnail and Selected Poems of Yone Noguchi by Noguchi, Poetry 20,June 1922, p. 164.
[49]这种筛选是机器学习自上世纪90年代初崛起后最常见的用途之一。它比旧的文本分类方法更为有效和高效,因为旧的分类方法要依靠人类专家以人力来设定与他们分析的任何文本都密切联系的分类规则。随着机器学习的发展,专家们可以放手让机器来推导出规则,他们则把关注点集中在识别类别本身。法布瑞尔·塞巴斯蒂阿尼(FabrizioSebastiani)在《文本自动分类中的机器学习》一文里全面介绍了信息系统领域内机器学习的历史。见“Machine Learning in Automated TextCategorization,”in ACM Computing Surveys 34 (Mar. 2002),pp.1–47.
[50]参见StephenRamsay, “In Praise of Pattern,” in TEXT Technology 14, No. 2(2005), pp. 177–90; Bradley Pasanek and D. Sculley, “Meaning and Mining: TheImpact of Implicit Assumptions in Data Mining for the Humanities,” in Literaryand Linguistic Computing 23 (Dec. 2008), pp. 409–24; Shlomo Argamonet al., “Gender, Race, and Nationality in Black Drama, 1950–2006: MiningDifferences in Language Use in Authors and Their Characters,” in DigitalHumanities Quarterly 3, No. 2 (2009); Matthew Jockers, Macroanalysis:Digital Methods and Literary History, Urbana, Ill., 2013, chap. 6.
[51]Ted Underwood, et al., “MappingMutable Genres in Structurally Complex Volumes,”该文是2013年“IEEE大数据国际会议”发言论文(未发表)SantaClara, Calif., 6-9 Oct. 2013; David Bamman, Underwood &Noah Smith, “ABayesian Mixed Effects Model of Literary Character,”该文是第52届计算机语言学年会提交论文,Baltimore, 22–27 June 2014,
[52]对多类型文本分类的解释与示范,参见Jockers, Macroanalysis,chap.6.
[53]Sebastiani, “Machine Learningin Automated Text Categorization,” p. 3.
[54]与之相对,“无监督方法”允许机器首先根据某些特定的特点决定文件可以如何聚集;这些聚类是否能与有意义的类型相对应则留待使用者决定。对此较有帮助的解释见Jockers, Macroanalysis,pp. 70–71.
[55]在日本,俳句与短歌天然与截然不同的审美取向、艺术谱系以及风格和社会标记相连。传统上俳句专门描述自然世界或给出哲学与社会方面的评论;短歌则与情绪和感情表达相关。不过这些精细的区别通常被美国的诗人和评论家忽视,结果二者往往被混在一起,都作为一个单一的日本诗歌传统的一部分。
[56]这些诗歌是从哈蒂信托数字图书馆(HathiTrust Digital Library)和现代期刊项目中收集来的。由于这些资料集仅限于公共领域的作品,我们只能收集在时间限制上早于1923发表的诗歌。对于哈莱姆文艺复兴的期刊,我们是根据原版的内容手动输入诗歌。
[57]Justin Grimmer and BrandonStewart, “Text as Data: The Promise and Pitfalls of Automated Content AnalysisMethods for Political Texts,” in Political Analysis 21(Summer 2013), p. 270.
[58]在一个二元制词包方法中,单词由它的存在或缺席来表示;关于它的优点,参见Pasanek and Sculley, “Meaning and Mining,” p. 413; 以及Bei Yu, “An Evaluation of Text Classification Methods forLiterary Study,” in Literary and Linguistic Computing 23(Sept. 2008), pp. 329–30. 二人都探讨了何时应该包括功能词的问题。功能词可能有助于识别作者风格。另见Jockers, Macroanalysis,p. 64. 我们也没有考虑字母大写,并且去除了所有标点符号,仅保留俳句文本中经常出现的感叹号和长破折号。
[59]后一种任务通常被称为特征选择(featureselection),它有助于减少由大量的低频特征产生的统计噪音。我们还可以略去在两种文本类别中都多次出现的词,这样也可以减少其他方向上的特征。
[60]诗歌的音节数取自用户输入并参照卡耐基梅隆大学(CMU)美式英语发音词典。随后,我们查看了翻译和改编两种俳句语料库音节数的分布,并使用该结果来创建截点。这样在翻译作品中我们使用18个音节作为阈值,每个文本被表示为或多于或少于这个数量。
[61]该观点见Pasanek andSculley, “Meaning and Mining,” p. 412. 第一组方法包括支持向量机(SVM)和逻辑回归等基于线性的模型;第二组方法包括朴素贝叶斯算法和隐马尔科夫模型;最后一组方法包括决策树分类器。对上述所有方法的详细描述,参见Sebastiani, “Machine Learning inAutomated Text Categorization.”
[62]对该分类器更全面的介绍,参见Grimmerand Stewart, “Text as Data,” p. 11. 它的“朴素”特征与其核心统计学假设有关,即在一个特定类别的文本中,各单词都是相互独立地生成的。这显然是错误的,因为在一组类似的文本中,单词的使用通常高度相关。但是,在某些种类的文本归类中,这种简单的方法仍然被证明是非常有效的。
[63]参见Yu, “AnEvaluation of Text Classification Methods for Literary Study,” p. 336. 在利用机器学习分析文学文本时,朴素贝叶斯算法经常被拿来与支持向量机(SVM)相对比。参见Argamonet al., “Gender, Race, and Nationality in Black Drama, 1950–2006”; Pasanek andSculley, “Meaning and Mining”; Yu, “An Evaluation of Text Classification Methodsfor Literary Study.” 支持向量机往往分离出较高频次的单词隔离,将其作为有影响力的特征。
[64]所有诗歌片段都均取自CarlSandburg, Chicago Poems, New York, 1916. 我们只选取那些明确表现城市主题或描写城市居民的诗歌。
[65]具体说来,我们执行了四重交叉验证,使用四分之三的组合样本作为训练数据,其余四分之一作为测试数据。
[66]所有这些准确度得分基于对每组归类做的随机测试,具有高度统计学意义。得分范围在54%到64%之间,而这种测试的理想分数是50%,这意味着该机器正确猜测的能力与抛硬币决定相差无几。
[67]在机器学习中,“查准率”(precision)衡量的是分类器的准确度,并显示分类器正确分辨给定文本所属类别的频度。高查准率意味着在某个文本类别中发现了高度独特的特征。“查全率”(recall)衡量的是分类器的全面性或敏感性,并显示分类器猜对某一特定类型的文本的数量。低查全率意味着该类别的文本更经常地省略能辨识其所属类型的特征。
[68]某些期刊出现了相反的情况,即准确率下降是由于更多的俳句被误判为非俳句了。虽然对这些错误的分析不在本文范畴内,但是我们需要注意到,这些结果告诉我们一些关于这些期刊短诗的创作的重要信息。完全基于这些诗的发表出处,我们暂且把它们视为不同于俳句的一个统一类别,尽管事实上这些诗以各自独特的方式具有内在多样性。
[69]相比之下,当我们采用更精确的俳句模型时,仅出现45首分类错误的诗歌。
[70]Richard Aldington, “Epigram,”in The Little Review 3 (Mar. 1916), p. 29.
[71]George Briggs, “Evelyn,”in The Smart Set 52 (Aug. 1917), p. 28.[72]Sarsfield Young, “Poem ofNature,” in The Smart Set 50 (Dec.1916), p. 104.
[73]这是被施罗姆·阿加门和马克·奥尔森称为“最小公分母”的问题。分类算法往往会倚重所有特征里的一小部分,这样既没有足够突出、也没有在思辨上公正地对待文学作品的复杂性。(见Shlomo Argamon and Mark Olsen,“Words, Patterns and Documents: Experiments in Machine Learning and TextAnalysis,” in Digital Humanities Quarterly 3, no. 2, 2009)。
[74]Anna Porter, “A SierraJuniper,” in Lyric West 1, No. 4 (1921), p. 18.
[75]罗尔斯有好几首诗是归类错误,其中包括《致武士》《日落》和《艺伎与古筝》,均发表于1922年。对庄子的指涉出自兰利的《四月幻觉》,同样发表于1922年。在调查了所有误判诗歌后,我们发现大约20%属于候补俳句,40%属于机器俳句,剩下的40%属于居中的俳句。
[76]转引自JanineBeichman, Masaoka Shiki: His Life and Works, Boston, 2002, p.35. 正冈子规这里借自“‘一名精通数学的当代学者’”来支持俳句即将走到尽头的论点(p. 35)。
[77]Tristan Tzara, “To Make aDadaist Poem” (1920); Seven Dada Manifestos, in “Seven DadaManifestos” and “Lampisteries,” Barbara Wright (trans.), London,1977, p. 39.
[78]Johanna Drucker, TheVisible Word: Experimental Typography and Modern Art:1909-1923, Chicago,1994, p. 114.