What does the future of supercomputing look like? Much of the current researches and developments in HPC are focused on exascale computing. For HPC architectures, this can be taken to mean working towards a system with a floating point performance of at least 1 exaflop/s (i.e. 1018 or a million million million floating point calculations per second). This is around 10 times more than the number one system in the Top500 list at the end of 2016 (China’s Sunway TaihuLight). Although at some level this is just an arbitrary number, it has become a significant technological (and political) milestone.
Both China and Japan are known to be working on exascale systems with a projected date of 2020. In 2016, the US government announced the Exascale Computing Project, which aims to have its first supercomputer operating at 1 exaflop/s or more in production by 2021. If past trends in the Top500 list had been followed, then a 1 exaflop/s system would have been expected in 2018 – the fact that this date will be missed by at least 2 or 3 years is a measure of the technical challenges involved, both in the hardware and software.
Some of the main barriers to building a useful and economically viable exascale machine are as follows:
Since around 2006, there has been little significant increase in the clock frequency of processors. The only way to extract more performance from computers has been through more parallelism, by having more cores per chip, and by making each cores capable of more floating point operations per second. Without a fundamental move away from the current silicon technology, there is no real prospect of significantly higher clock frequencies in the next 5-10 years. There is also not much prospect of reducing network latencies by very much either. On the plus side, new memory designs such as 3D stacked memory, do promise some increases in memory bandwidth.
If we were to build a 1 exaflop/s computer today, using standard Intel Xeon processors, then it would consume around 400 megawatts of power: that’s enough electricity for 700,000 households, or about 1% of the UK’s entire electricity generating capacity! That’s not only hugely expensive, but it would require a big investment in engineering infrastructure, and would be politically challenging from a carbon footprint point of view. The target energy consumption for an exascale system is 20-30 megawatts. Some of the required savings can be made by using special purpose manycore processors, such as GPUs, instead of standard Xeons, but we are still around a factor of 5 off this target. Closing this gap is one of the big challenges in the short to medium term – some savings can be made by reducing the clock frequency of processors, but this has to be compensated for by a corresponding increase in the number of cores, in order to meet the total computational capacity target.
As the number of cores, and other components such as memory, network links and disks, increases, so does the risk that components will fail more often. As a rule of thumb, a supercomputer service becomes unacceptable to users if the rate of visible failures (i.e. failures that cause running applications to crash) is more than about one per week. Building bigger and bigger supercomputers with more and more components, the mean time between failures will tend to decrease to a point where the system becomes effectively unusable. While some types of program can be written so as to be able to deal with hardware failures, it turns out to be very hard to do this for most HPC applications without seriously compromising performance.
It’s all very well to build an exascale computer, but there is not much point unless applications can make practical use of them. As the degree of parallelism in the hardware continues to increase, it gets harder and harder to make applications scale without running into unavoidable bottlenecks. Strong scaling (i.e. obeying Amdahl’s Law) is very challenging indeed. Weak scaling (as in Gustafson’s Law) is easier to achieve, but often does not result in solving the problems scientists are actually interested in.
It is likely that for the first generation of exascale systems, there will be only a small number (maybe only in the low tens) of application codes that can usefully exploit them. Even to achieve this will require heroic efforts by application developers and computer scientists, and also some degree of co-design: the hardware itself may be tailored to suit one or few particular application codes rather than providing full general purpose functionality.
So that’s where HPC is heading in the next 5 years – a strong push towards exascale systems. Even though these may only be usable by a small number of applications, the technologies developed along the way (both in hardware and software) will undoubtedly have an impact at the more modest (i.e. tera- and peta-) scale, and the performance of real application codes will continue to increase, even if at a somewhat reduced rate than in past decades. In your opinion, which of the above barriers is the hardest to breach? Why do you think so?
Related links