经济学人科技|| 机器学习的成本问题

导读

感谢思维导图作者

Jennie，女，MTI低才生，社会学爱好者

听力|精读|翻译|词组

The cost of training machines is becoming a problem

机器学习的成本逐渐成为一个问题

英文部分选自经济学人20200611期Technology Quarterly版块

The cost of training machines is becoming a problem

机器学习的成本逐渐成为一个问题

Increased complexity and competition are part of it

部分原因是复杂度增加和竞争更加激烈

The fundamental assumption of the computing industry is that number-crunching gets cheaper all the time. Moore’s law, the industry’s master metronome, predicts that the number of components that can be squeezed onto a microchip of a given size—and thus, loosely, the amount of computational power available at a given cost—doubles every two years.

计算领域的基本假设是数字运算（处理）总会变得越来越便宜。计算领域的发展遵循摩尔定律给出的节奏，而摩尔定律预测在给定面积的微芯片上可集成的组件数（因此，大概地说在既定成本下可获得的计算力)每两年翻一番。

For many comparatively simple AI applications, that means that the cost of training a computer is falling, says Christopher Manning, an associate director of the Institute for Human-Centered AI at the University of Stanford. But that is not true everywhere. A combination of ballooning complexity and competition means costs at the cutting edge are rising sharply.

斯坦福大学“以人为本人工智能研究所”副所长克里斯托弗•曼宁（Christopher Manning）表示，对很多相对简单的人工智能应用程序来讲，根据摩尔定律，训练一台计算机的成本在下降。但这并不适用于所有情况。不断增长的复杂度加上愈发激烈的竞争，意味着最前沿的机器学习成本正在急剧上升。

Dr Manning gives the example of BERT, an AI language model built by Google in 2018 and used in the firm’s search engine. It had more than 350m internal parameters and a prodigious appetite for data. It was trained using 3.3bn words of text culled mostly from Wikipedia, an online encyclopedia. These days, says Dr Manning, Wikipedia is not such a large data-set. “If you can train a system on 30bn words it’s going to perform better than one trained on 3bn.” And more data means more computing power to crunch it all.

曼宁博士举了BERT的例子。BERT 是谷歌在2018年建立的人工智能语言模型，并在谷歌搜索引擎上使用。该模型有超过3亿5千万个内部参数，对于数据的需求极大。训练词库包含了33亿个单词，其中大部分单词采集自网络百科---维基百科。曼宁博士表示，现在来看，维基百科的数据集并不算大。“如果用300亿个词汇来训练一个系统，结果会比使用30亿个词汇训练的系统表现得更好。”而更多的数据意味着需要更多的运算力来“消化”它。

OpenAI, a research firm based in California, says demand for processing power took off in 2012, as excitement around machine learning was starting to build. It has accelerated sharply. By 2018, the computer power used to train big models had risen 300,000-fold, and was doubling every three and a half months (see chart). It should know—to train its own “OpenAI Five” system, designed to beat humans at “Defense of the Ancients 2”, a popular video game, it scaled machine learning “to unprecedented levels”, running thousands of chips non-stop for more than ten months.

一所位于加州的科研公司OpenAI表示，2012年，人们开始对机器学习感兴趣，因此对计算机处理能力的需求也开始增加，并且增速巨大。到2018年，用于大模型训练的计算机的运行能力增长了30万倍，且每三个半月就翻一番（见图表）。为了训练其研发设计的OpenAI5系统能在流行的电子游戏《DOTA2》中战胜人类玩家，该公司清楚自己需将机器学习能力提升到“前所未有的水平”，（完成这样的训练）需要数千枚芯片连续10个多月不间断运行。

Exact figures on how much this all costs are scarce. But a paper published in 2019 by researchers at the University of Massachusetts Amherst estimated that training one version of “Transformer”, another big language model, could cost as much as $3m. Jerome Pesenti, Facebook’s head of AI, says that one round of training for the biggest models can cost “millions of dollars” in electricity consumption.

有关机器训练成本的具体数据非常少。但马萨诸塞大学阿默斯特分校的研究者于2019年发布了一篇论文，该论文指出训练一款大型语言模型“变形金刚”的成本可达300万美元。脸书的AI负责人杰罗姆·佩森蒂（Jerome Pesenti）表示，对最大的模型而言，一轮训练下来，电力成本就有“好几百万美元”。

Help from the cloud

云端助力

Facebook, which turned a profit of $18.5bn in 2019, can afford those bills. Those less flush with cash are feeling the pinch. Andreessen Horowitz, an influential American venture-capital firm, has pointed out that many AI startups rent their processing power from cloud-computing firms like Amazon and Microsoft. The resulting bills—sometimes 25% of revenue or more—are one reason, it says, that AI startups may make for less attractive investments than old-style software companies. In March Dr Manning’s colleagues at Stanford, including Fei-Fei Li, an AI luminary, called for the creation of a National Research Cloud, a cloud-computing initiative to help American AI researchers keep up with spiraling bills.

脸书当然负担得起这笔开销，它在2019年一年的盈利额就有185亿美元；但资金没那么充裕的公司就捉襟见肘了。美国颇有影响力的风投公司安德森·霍洛维兹基金（Andreessen Horowitz）指出，很多AI初创公司会从像亚马逊和微软这样的云计算公司租借计算机处理服务，最终的租用费有时会达到营业额的25% ，甚至更高。这也是相比老牌软件公司，AI初创企业投资吸引力不足的原因之一。三月，曼宁博士的同事、包括AI专家李飞飞在内，呼吁创建“国家研究云端”。“国家研究云端”是一个云计算项目，旨在帮助美国AI研究者面对猛涨的运算服务费用时能继续把研究开展下去。

The growing demand for computing power has fueled a boom in chip design and specialized devices that can perform the calculations used in AI efficiently. The first wave of specialist chips were graphics processing units (GPUs), designed in the 1990s to boost video-game graphics. As luck would have it, GPUs are also fairly well-suited to the sort of mathematics found in AI.

对计算能力不断增长的需求促进了芯片设计的迅猛发展，也推动了为AI提供高效计算功能的专用设备的迅猛发展。专用芯片的第一波浪潮是图形处理器（GPU），它于20世纪90年代问世，用来增强电子游戏的图形效果。碰巧的是，GPU也很适合用于AI中的数学计算。

Further specialisation is possible, and companies are piling in to provide it. In December, Intel, a giant chipmaker, bought Habana Labs, an Israeli firm, for $2bn. Graphcore, a British firm founded in 2016, was valued at $2bn in 2019. Incumbents such as Nvidia, the biggest GPU-maker, have reworked their designs to accommodate AI. Google has designed its own “tensor-processing unit” (TPU) chips in-house. Baidu, a Chinese tech giant, has done the same with its own “Kunlun” chips. Alfonso Marone at KPMG reckons the market for specialised AI chips is already worth around $10bn, and could reach $80bn by 2025.

进一步实现专业化是有可能的，很多公司也正跻身这一行业，提供专用AI芯片。去年12月，大型芯片制造公司英特尔以20亿美元的价格收购了以色列Habana Labs公司。英国公司Graphcore成立于2016年，其估值在2019年时为20亿美元。而已进入该行业的公司，如最大的GPU制造商Nvidia，为了适应AI，已经修改了他们的设计。谷歌内部自造了TPU芯片。中国科技巨头百度同样自研出了“昆仑”芯片。毕马威（KPMG）的阿方索·马龙（Alfonso Marone）认为，专用AI芯片的市场规模已达到100亿美元左右，且有望于2025年之前到达800亿美元。

“Computer architectures need to follow the structure of the data they’re processing,” says Nigel Toon, one of Graphcore’s co-founders. The most basic feature of AI workloads is that they are “embarrassingly parallel”, which means they can be cut into thousands of chunks which can all be worked on at the same time. Graphcore’s chips, for instance, have more than 1,200 individual number-crunching “cores”, and can be linked together to provide still more power. Cerebras, a Californian startup, has taken an extreme approach. Chips are usually made in batches, with dozens or hundreds etched onto standard silicon wafers 300mm in diameter. Each of Cerebras’s chips takes up an entire wafer by itself. That lets the firm cram 400,000 cores onto each.

Graphcore的联合创始人奈杰尔·图恩（Nigel Toon）表示：“计算机架构需遵循其处理的数据的结构。” AI工作负荷最基本的特征便是“以令人尴尬的方式同时运转”，也就是说它们可以被切割为成千上万个部分，每个部分都可同时被处理。例如， Graphcore的芯片有1200多个单独的数字运算“核”，还可以连接起来进行更强大的运算。加利福尼亚一家初创公司Cerebras采取了一种极端的方式。芯片通常是成批制造，再把数十或数百批芯片蚀刻在直径300毫米的标准硅片上，而Cerebras公司的芯片都是一批芯片占据一整块硅片。这样一来，该公司的每个芯片都可以塞进400,000个运算核。

Other optimisations are important, too. Andrew Feldman, one of Cerebras’s founders, points out that AI models spend a lot of their time multiplying numbers by zero. Since those calculations always yield zero, each one is unnecessary, and Cerebras’s chips are designed to avoid performing them. Unlike many tasks, says Mr Toon at Graphcore, ultra-precise calculations are not needed in AI. That means chip designers can save energy by reducing the fidelity of the numbers their creations are juggling. (Exactly how fuzzy the calculations can get remains an open question.)

其他的优化项目同样重要。Cerebras的创始人之一安德鲁·费尔德曼（Andrew Feldman）指出，人工智能模型耗费大量时间做数字乘以零的运算。由于这些计算结果总会是0，因此每次计算都是无用功，而Cerebras芯片的设计就可以规避这些无用的计算。图恩先生表示，AI与其他领域不同，不需要超精确的计算。这也就是说，芯片设计师可以通过降低数字处理的精确性来节省能源。

All that can add up to big gains. Mr. Toon reckons that Graphcore’s current chips are anywhere between ten and 50 times more efficient than GPUs. They have already found their way into specialised computers sold by Dell, as well as into Azure, Microsoft’s cloud-computing service. Cerebras has delivered equipment to two big American government laboratories.

这些都会带来巨大的收益。图恩先生预计Graphcore公司现有芯片的效率要比GPU高出10到50倍。戴尔出售的专用计算机、以及微软云计算服务系统Azure当中都应用了Graphcore芯片。Cerebras公司的设备已经在美国两家大型的政府实验室中投入运用。

“Moore’s law isn’t possible any more”

“摩尔定律已经失效了”

Such innovations will be increasingly important, for the AIfuelled explosion in demand for computer power comes just as Moore’s law is running out of steam. Shrinking chips is getting harder, and the benefits of doing so are not what they were. Last year Jensen Huang, Nvidia’s founder, opined bluntly that “Moore’s law isn’t possible any more”.

随着摩尔定律逐渐失效，由人工智能激起的对计算能力的需求激增，此类的创新将越来越重要。缩小芯片的难度日益加大，带来的收益也不如从前。去年Nvidia的创始人黄仁勋（Jensen Huang）直言不讳地表示：“摩尔定律已经失效了”。

Quantum solutions and neuromantics

量子学方案及神经学

Other researchers are therefore looking at more exotic ideas. One is quantum computing, which uses the counter-intuitive properties of quantum mechanics to provide big speed-ups for some sorts of computation. One way to think about machine learning is as an optimisation problem, in which a computer is trying to make trade-offs between millions of variables to arrive at a solution that minimises as many as possible. A quantum-computing technique called Grover’s algorithm offers big potential speed-ups, says Krysta Svore, who leads the Quantum Systems division at Microsoft.

因此，其他研究者正另辟蹊径，考虑一些更加新奇的想法。其中一项便是量子计算，即运用量子力学的反直觉属性来提供巨大的加速度，进而支持某些运算。机器学习可以被视为一个优化问题，也就是一台计算机在数百万的变量中进行取舍，以得出一个最小解的问题。微软量子系统部门的负责人克里斯塔•斯沃尔(Krysta Svore)表示，一种名为格罗弗算法的量子计算技术能够提供巨大的潜在加速度。

Another idea is to take inspiration from biology, which proves that current brute-force approaches are not the only way. Cerebras’s chips consume around 15kW when running flat-out, enough to power dozens of houses (an equivalent number of GPUs consumes many times more). A human brain, by contrast, uses about 20W of energy—about a thousandth as much—and is in many ways cleverer than its silicon counterpart. Firms such as Intel and IBM are therefore investigating “neuromorphic” chips, which contain components designed to mimic more closely the electrical behaviour of the neurons that make up biological brains.

另一个灵感来自于生物学，它证明目前的暴风(brute-force)算法并不是唯一的算法。Cerebras公司的芯片在全速运行时的功耗大约是15千瓦，足以为几十户家庭供电（而同样数量的GPU消耗的电力是它的数倍）。相比之下，人脑的功耗约为20瓦，仅为芯片的千分之一，且在很多方面表现都优于硅芯片。因此，英特尔和IBM等公司都在进行“神经形态”芯片的研究，这种芯片的元件可以更精确地模拟人脑神经元的电流传导。

注：brute-force：physical strength, rather than intelligence and careful thinking

For now, though, all that is far off. Quantum computers are relatively well-understood in theory, but despite billions of dollars in funding from tech giants such as Google, Microsoft and IBM, actually building them remains an engineering challenge. Neuromorphic chips have been built with existing technologies, but their designers are hamstrung by the fact that neuroscientists still do not understand what exactly brains do, or how they do it.

不过目前，一切还很遥远。量子计算机已经有了较好的理论支持，但尽管谷歌、微软、IBM等科技巨头投入了数十亿美元的研发基金，实际制造仍是一个工程学上的挑战。虽然利用现有技术已能制造出神经形态芯片，但由于神经科学家对大脑的功能及其工作原理都知之甚少，使得芯片设计者们受到了很大的牵制。

That means that, for the foreseeable future, AI researchers will have to squeeze every drop of performance from existing computing technologies. MrToon is bullish, arguing that there are plenty of gains to be had from more specialised hardware and from tweaking existing software to run faster. To quantify the nascent field’s progress, he offers an analogy with video games: “We’re past Pong,” he says. “We’re maybe at Pac-Man by now.” All those without millions to spend will be hoping he is right.

这意味着，在可预见的未来，人工智能的科研人员将不得不榨取现有计算机技术的每一分性能。图恩先生对此表示看好，他认为更专业的硬件、以及对现有软件进行调试而提高其运行速度，二者都可以带来许多好处。为了量化这个新领域的进展，他将其比作电子游戏：“我们已经超越了《pong》的阶段，现在差不多在《吃豆人》阶段了。”对于没有巨大财力的公司来说，希望他是正确的。

翻译组：

Lecea, 坚信“腹有诗书气自华”的追梦girl

无忌，心怀梦想不断努力的理想主义男孩

Chao，爱读书思考的DPhil candidate，TE爱好者

Piggy，爱吃鱼，爱跳舞，爱睡回笼觉的小能猫，NUS食品系

Iris，少女心爆棚的前职场老阿姨，现国际女子修道院MTIer，外刊爱好者

校对组：

Lee ，爱骑行的妇女之友，Timberland粉

Rachel，学理工科，爱跳芭蕾，热爱文艺的非典型翻译

观点|评论|思考

本期感想

Zihan，男，对冲基金，大数据工程师

我们最想知道是当我们到达摩尔定律的尽头的时候，芯片技术是不是就停滞了。这就像是三体里说的技术的硬瓶颈一样。但是在到达尽头之前，我们需要先理解这个尽头到底在哪里。

首先，程序语言的选择就已经可以让程序快几倍了。同样的程序，用C跑也许可以比用Python跑快上几十倍。其次，是算法层次的提高，程序优化过程中可以对复杂的计算进行优化。尤其是对很多复杂的矩阵计算进行线性代数层面的优化，这样的优化有的可以提高到几何倍甚至乘方级别的时间缩减。

那么除了程序语言还有算法优化以外，文章又提到了量子运算。量子运算的思维框架和传统的二进制记忆储存的机器不同：机器的内存可以同一时间存在于多个状态，用量子位元来表示。这样讲貌似很难理解，我用更通俗的话解释一下。如果你有两个比特 (bit), 那么在一个时间点，你可以表达四种状态中一种状态：

（0，0），（0，1），（1，0），（1，1）

但是量子位可以在同一时间，表达四种状态的所有状态。这意味着什么？如果你的量子计算机有三个比特，你可以同时表达二的三次方种状态，也就是8种状态。四个比特，你可以同时表达二的四次方种状态，也就是16种状态。那么现在，谷歌在2019年10月的时候，量子计算机可以有53个量子位，那么可以在同一个时间点，表达超过10,000,000,000,000,000种状态。这种计算能力是可怕的。而且增长速度不是线性的，而是以指数级增长的。

但是这看似无敌的技术其实还处于婴儿期。普通的处理器可以在追踪大部分比特，但是量子位的管理是非常苦难的。纠缠在一起的量子位在很短时间内又可能分离开，让整个管理机制变得极其复杂。

说了这么多和我们有什么关系？首先就是加密和解密。破解世界上的任何密码无非是一个复杂的计算问题。从我们的手机密码，到我们医疗数据，到我们的银行的信息都是被加密保护储存在数据库里的。然而，当这个计算一旦落入了不善的组织的手里，一切利器都是变成武器。加密的对立面就是破解。用成熟的量子破解算法去解密普通的加密也许可以在几秒内完成。那么反之，如果用成熟的量子加密算法去给重要信息加密，普通的解码机制可能会算到宇宙的趋近尽头也算不出来解。

云计算的技术在会让这些专业级别的基础慢慢步入到我们的生活中。很多企业已经开始大量使用了云计算的服务来优化自己的开销，毕竟管理数据中心是一件很辛苦和高成本的事情，而租用服务则好很多。这样中小型企业更有可能发展起来。而无论是计算的优化也好，还是从传统意义上的技术革新也好，最终都无法离开安全。这是一个我们所有人值得深思的问题。未来法律要如何去和这些新兴技术融合和共同进化才能让大部分的使用者可以安心的使用这些技术呢？我们有如何保证利器不会落入坏人的手里呢？快的发展必然会面对更多的挑战，尤其是指数增长的技术。

愿景

打造
独立思考 | 国际视野 | 英文学习

小组

现有经济学人讨论群一个,如果您也有兴趣，可联系小编WeChat : Education0603。由于每天加小编人很多，为提高效率，大家添加小编，暗号“TE讨论群", 通过后，请做好以下三点，否则不回复，谢谢理解。

1.转发译文到100人以上英语学习群或者朋友圈

2.回答三个问题（在公众号后台回复“群规”，请务必仔细阅读群规以及出现的三个问题）

3.加小编后做个简单的自我介绍，谢谢大家。

长按下图进行打赏