博客首页 | 排行榜 |

SystemDesign的博客

个人档案
博文分类
听专家讲述让多核设计简易之法  2009-06-29 16:28
多核芯片已经蓄势待发!现在?答案是肯定的。现在这已经是在很多工具供应商、设计工程师、软件开发人员乃至测量半导体性能和效率的人员中产生共鸣的话题,因此有了多核技术展会和协会组织,包括“谁是谁的电子产品”的多核协会。40年来,世界上许多工作组开发了不同的策略用来解决麻痹了即使最好软件思想的难题。

为什么要发展多核?

在制造工艺发展到130nm节点时,就注定了多核技术的发生。当到下一个工艺节点时,他们意识到不改造芯片就无法加快时钟速度。为了实现更高的产品性能,从90nm之后的每一个新工艺节点人们想法设法增强产品处理能力,解决问题的办法就是增加更多的低速处理核,同时减轻软件开发商的负担。毕竟我们很难和物理法则争论。

这就解释了为什么目前很多大学计算机科学部门把多核研究作为非常重要部分,致力于解决如何规划多核芯片的难题。这个问题很有趣,解决它的人也可以得到丰厚的回报。

这也有助于解释为何2007年英特尔在VMware上巨资投入2.185亿,VMware是在一个芯片上使用更多核的安全网。如果软件开发不能在多核芯片上运行,使用虚拟机至少可以让单操作系统的多个实例或者多操作系统在芯片上运行。英特尔在即将推出的芯片上加入了“加速模式”,如果应用需要的话,可以允许在一个单核上使得芯片的总功率大幅提高。

设计多核芯片

在多核世界里,最头疼的莫过于当你从60000个引脚跑下来时一个核不够使用。其实这也不是不可能的,半导体中有许多同构内核由类似英特尔和飞思卡尔这样的公司制造,当然也有异构内核,有时两者都存在于SoCs中。

虽然设计一个同构内核的芯片很容易,你只需开发一次,弄清楚共享内存的最佳方式以及总线交通模式,然而对于如智能手机这种多功能设备来说,这种方法几乎没有什么效率。原因在于每一个应用程序需要不同的处理功率,如果都分配了最大的功率并不是一个理想的策略。

在嵌入式领域,ARM对这个问题首先作了改造,其设计的ARM11 MPCore 多核处理器可以配置1至4个核。

为了简化芯片的制造,所有主要的EDA工具供应商已经或者正在他们的生产流程中使用多核。Mentor Graphics一直致力于用多核来调试其产品,Synopsys用多核来核对、实施和制造,Cadence给virtuoso增加了多核支持。未来几个月内,预计会有更多这样的消息和厂商,同时还有虚拟机原型的解决方案和更快的仿真。

应用软件在哪里?

文至此处,芯片制造商和检测、制造工具已经就位,下一步呢?

下一块应该是应用软件。过去大多数写好的代码使用的是串行方式,虽然也有工具可以使用,但是把它们编译到多核中却十分困难。

Criticalblue引入了Prism工具来帮助这些代码并行化。当然你还不能一键搞定全部,也不能改编一个应用软件适合两个核并使之充分利用32核,但这种工具是迈向正确方向的第一步。

另外一块主要难题是软件互连。PolyCore制定了一个中间件和工具来解决这个问题,将功能分配于不同的核上——这在多核拓扑结构中尤为重要,共享总线和内存会产生单核芯片所没有的问题。

最终,Virtutech为多核应用Simics工具制定了模拟环境,创造了应用的假设情景。

但是所有的这些工具还不能产生符合更多多核的新应用软件。PolyCore的总裁Sven Brehmer认为软硬件之间的缺口史无前例,需要很多年才能弥合它们的差距。

Brehmer说:“许多开发者使用多核却仍不知道如何开发相应的软件,他们也不想在这方面花钱。但是没有灵丹妙药,但是开源社区认为有必要简化多核。我们已经解决了部分问题,但是还有很多工作要做,其速度必须跟上软件开发者的步伐。你不可能一夜从2核到6核。”

虽然开发过程困难重重,但是目标明确。EEMBC的总裁Markus Levy表示:“由于所有潜在的金钱回报,一些事情将由此产生。”

宣传与现实

当多核设计达到特定规模时将是另外一码事。虽然有关多核首创的说法,事实上从未有过持续的努力使得多核方法运作。过去软件并行化的问题仅仅局限于一个小圈子里面,比如大学的计算机科学研究部门和诸如IBM和AT&T的这样的公司。因为就这个问题的大部分而言还没有必要解决,所以还没形成合力来解决它。

事情甚至会更难理解,当然很难给出一个直接的答案关于多核中的利弊。因为一个软件能够在多核机器上运行,这并不意味着其在四核上的运行速度比单核的快。事实上,即使一些软件运行于四核处理器,但还不如单核的利用率高。Levy认为:“很多公司把现有的产品贴上新的标签并声称兼容多核。”
征服多核
  来源:EEMBC

在多核世界里,运行的特别好的往往是那些在语法上可以分成具体部分的软件。例如图形和视频渲染。Imagination Technologies公司,位于英国的IP供应商,制造了可扩展多核图形引擎使得应用软件可以计算并行化,在最近的多核博览会上,公司技术部门的销售副总裁Tony King-Smith认为这是一个主旨。

他说:“我们可以把1到4个甚至更多的核并行化,直到64核。但要做到这一点,结构布局应该正确,如果你错了,你将花费更多的精力。”

飞思卡尔在其多媒体DSP技术中已经采取了类似的做法。Kent Fisher,飞思卡尔的网络和多媒体部的系统总工程师,他认为其部门决定的关键在于是否使用更小的核还是一些更大的核。“这取决于您的应用软件,”他说:“频率和基础架构的周期将继续成为问题,直到软件工具跟上硬件的步伐,。”

同时他也指出多核功率规格中的问题,不是每个人以同样的方式来指定功率规格。

分解Atom

从软件应用的角度来看,还有不少需要考虑的挑战。首先,我们要把不同的功能分开以及把这些功能分成不同的部分,两者之间需要一个适当的平衡。

“你真正需要做的是找到一个应用程序的每个功能的相对负荷,” PolyCore’s Brehmer说道:“如果你有一个估算,你可以在多核上进行复制。但是你也必须考虑数据的依赖性,因为你不能打破一个功能,当然这个功能取决于来自其它地方的数据。否则,你只需等待那个数据。”

我们需要了解在一个芯片上即将需要的资源,在这个方向上至少迈进了一大步,而且其在同构内核系统中运行不错。下一步将是更好的利用异构内核,这将需要了解应用软件一直在芯片结构中运行的功能。如果一个应用软件的所有部分在重要性或者所需的处理器的数量上不是等同的,那么为之分配相同的功率没有任何意义。

最终,一些软件开发也许会采取操作系统的更薄层(甚至在金属丝中执行),正如多核SoCS更多的集成于系统级设计。

性能更好,降低能耗是多核的承诺,不过这需要时间,需要工程师和科学家矢志不渝的努力,当然过去的合作集团也未曾发出过同样的话语。多核也是一门综合性学科,这是亟待解决的完全不同的问题。

原文:

Multicore chips are here to stay. Now what?

That question is echoing up and down the ranks of tools vendors, design engineers, software developers and even among people who measure the performance and efficiency of semiconductors. There is now a Multicore Expo and a Multicore Association that includes a who’s who of electronics. And there are lots of working groups developing different strategies to tackle this Hydra-like creature that has befuddled the best software minds in the world for four decades.

Why multicore?

Multicore was firmly on the horizon for chipmakers when they hit the 130nm process node. By the next process node, they realized, it would be impossible to turn up clock speeds without cooking the chip. For all intents and purposes, classical scaling—gaining performance at each new process node—ended at 90nm. The solution was to add more processing cores at lower speeds, and hand off the burden to software developers to fix the problem. After all, it’s hard to argue with the laws of physics.

This explains why most major university computer science departments now are dedicating a significant portion of their research to solving the conundrum of how to program multiple cores. The problem is interesting and the payoff can be huge to anyone who solves it.

It also helps explain why Intel invested $218.5 million in VMware in 2007, which is a safety net for utilizing more cores on a chip. If software can’t be developed to run on multiple cores, at least multiple instances of an operating system or multiple operating systems can run on the chip using virtual machines. Intel is adding “turbo mode” to its upcoming chips, though, which allows more of a chip’s total horsepower to be utilized in bursts on a single core if the application demands it.

Designing multicore chips

What becomes painfully obvious as you descend from 60,000 feet on the multicore world is that one core is not necessarily the same as the next. It can be. There are homogeneous cores in semiconductors made by companies such as Intel and Freescale, and there are heterogeneous cores in systems on chip, and sometimes there are both in SoCs.

While it’s easier to design a chip with homogeneous cores—you simply develop it once and then figure out the best way to share memory and bus traffic patterns—that approach isn’t nearly as efficient for a multifunction device such as a smart phone. The reason is that every application requires a different amount of processing power, and assigning the maximum to each one isn’t an ideal strategy.

In the embedded world, ARM has taken a first stab at this problem with its ARM11 MPCore multicore processor, which can be configured for one to four cores.

And to simplify building of the chips, all of the major EDA tools vendors either have or are working on multicore elements to their flows. Mentor Graphics has been working in multicore debugging with its Seamless products, Synopsys has added multicore for verification, implementation and manufacturing, and Cadence has added multicore support for virtuoso. Expect to see more announcements from these and other vendors over the next few months, as well as virtual prototyping solutions and faster simulation.

Where’s the application software?

So now that the tools to make the chips and follow them through the verification and manufacturing are being prepared, what’s next?

The next piece is application software, and most of the code that has been written in the past has been written using a serial approach. There is no easy way to compile that onto multiple cores, although there are tools to help.

Criticalblue just introduced its Prism tool to help parallelize legacy code. While you still can’t push a button to make it all work, and you can’t rework applications for two cores and have them fully take advantage of 32 cores, this kind of tool is a step in the right direction.

Another important piece of the puzzle is mapping the software to the interconnect. PolyCore has developed a middleware layer and tools to do that, distributing functions to different cores—something that is vital in multicore topologies, where shared busses and memory create problems that never existed in single-core chips.

Finally, Virtutech has developed a simulated environment for multicore applications with its Simics tool, creating what-if scenarios for applications.

But all of these tools still don’t produce the kind of volume of new applications that can be scaled across many cores. Sven Brehmer, president of PolyCore, said the gap between hardware and software is larger than it has ever been—and it will take years to close that gap again.

“There is a broader group of developers using multicore but they don’t know how to develop software yet or they don’t want to spend money on this problem,” Brehmer said. “There is no magic bullet here, but the open source community sees a need to simplify multicore. We’ve solved a portion of the problem but there’s a lot of work to be done and it has to be done at a pace that works for software developers. You can’t go from two to six cores overnight.”

But at least there is an incentive. “With all the potential monetary rewards, something will come out of this,” said Markus Levy, president of the Embedded Microprocessor Benchmark Consortium (EEMBC).

Hype vs. reality

When multicore programming gains critical mass is another matter. For all the talk about multicore initiatives, the reality is that there has never been a consistent industry effort to making multicore approaches work. And the problems of parallelizing software in the past have been confined to a small circle of computer science researchers at universities and at companies like IBM and AT&T. There has never been a massive effort to solve the problem because for the most part it didn’t have to be solved.

Making matters even more confusing, it’s hard to get a straight answer about what’s real in multicore and what isn’t. Just because software can run on a multicore machine doesn’t mean it runs faster on four cores than on one. In fact, some software may not take advantage of more than one core even though it will work on a four-core processor. “There are a lot of companies taking existing stuff and putting a new label on it and saying it’s multicore compatible,” said Levy.

EEMBC

Source: EEMBC

What has worked exceptionally well in the multicore world are applications that can be parsed into specific pieces. Graphics and video rendering work particularly well, for example. Imagination Technologies, a U.K.-based IP vendor, builds scalable multicore graphics engines that parallelize the computing below the application level, Tony King-Smith, vice president of marketing for the company’s technology division, said during a keynote at the recent Multicore Expo.

“We can parallelize from 1 to 4 pipes and beyond, and we can multicore 1 pipe to 64 cores,” he said. “But to do this, you have to get the architecture right. If you get it wrong, you’ll spend too much effort on overhead.”

Freescale has taken a similar approach with its multimedia DSP technology. Kent Fisher, chief systems engineer for Freescale’s networking and multimedia group, said the big decision for his division is whether to use more smaller cores or a few larger cores. “It depends on your application,” he said. “And until the software tools catch up to the hardware, frequency and infrastructure per clock will continue to matter.”

He noted there is a problem in multicore power specifications, as well. He said that not everyone specifies power the same way.

Splitting the atom

From a software application perspective, there are several challenges that need to be considered. First, there needs to be a proper balance between splitting up different functions and splitting those functions into too many parts.

“What you really need to do is find the relative load of each function of an application,” said PolyCore’s Brehmer. “If you have a computation, you may be able to duplicate that on multiple cores. But you also have to look at data dependencies, because you can’t break a function out if it depends on data from some other place. Otherwise you’ll just be waiting for that data.”

That’s at least a major step toward understanding the resources that will be needed on a chip, which works well with homogeneous multicore systems. The next step will be better utilization of heterogeneous cores, which will require an understanding of application functionality all the way at the chip architecture level. It doesn’t make sense to have the same level of power for all parts of an application if those pieces are not identical in importance or the amount of processing that’s required.

And finally, some software development may be done with much thinner layers of an operating system—or even direct execution into the metal—as multicore SoCs become more integrated into system-level design.

The promise is better performance and ultimately lower power consumption, but it’s going to take time, committed effort of engineers and scientists, and collaboration from groups that in the past have never spoken the same language. Multicore is also multidisciplinary, and that’s a whole different problem to solve.

 

原文出处:http://chipdesignmag.com/sld/blog/2009/03/27/taming-the-multicore-beast/


作者:Ed Sperling   编译:与非网 吕勇
类别:专家博客 |
上一篇:可持续和环保对设计师为什么是毫无意义的约束? | 下一篇:独家研究:从芯片设计师的角度探寻下一个“牛市”
以下网友评论只代表其个人观点,不代表本网站的观点或立场