Ten Years

十年一剑!
-------------------------------------------------
Operating System Research / Technique

Thursday, October 05, 2006

(SOSP'05 Note)Speculative Execution in a Distributed File System


Speculative Execution in a Distributed File System
Edmund B. Nightingale et al.
SOSP 2005


本文讨论了在分布式文件系统中支持进程级推测执行的问题。一个激动人心的主题。


Speculative Execution不是一个新的技术,在CPU设计中已经广泛使用。IA-64的EPIC架构中,speculative execution就是一个很重要的主题。(想想还是六、七年前接触这个词了。)作者将其思想应用在进程级别,修改OS使之支持进程speculative execute,并在失败时rollback。Introduction中是这样说的:
"We demonstrate that, with operating system support for lightweight checkpointing, speculative execution, and tracking of causal interdependencies between processes, distributed file systems can be fast, safe, and consistent. Rather than block a process while waiting for the result of a remote communication with a file server, the operating system checkpoints its state, predicts the result of the communication , and continues to execute the process speculatively. If the prediction is correct, the checkpoint is discarded; if it is false, the application is rolled back to the checkpoint."


作者的动机很清晰。NFS之类的分布式文件系统,虽然client端有cache,但为了保证cache的一致性,仍需要使用同步的远程调用对cache进行确认。由于cache不一致的概率是很低的,如果OS可以支持speculative execution,那么可以让进程先按照cache里的内容继续执行,同时异步地向服务器进行查询,发现错误后对进程进行回滚。作者明确列出了这种推测执行方法适用地条件:
1. The results of speculative operations are highly predictable.
2. Checkpointing is often faster than remote I/O.
3. Modern computers often have spare resources.


要推测执行,首先要在开始推测时建立checkpoint,以便rollback。本文所采用的方法是:在系统调用的入口进行fork,利用fork的copy-on-write机制,为进程构造推测执行的副本,并对内核对象进行跟踪,记录必要的undo操作。为了保证其正确性,需要确保下面两个条件:
1. Speculative state should never be visible to the user or any external devices.
2. A process should never view speculative state unless it is already speculatively dependent upon that state.


文章用了很多篇幅讨论因果依赖关系的传播问题,包括各种对象如何传播依赖关系等,例如:本地内存、本地文件、管道、FIFO、socket、signal等等。作者在NFS和BlueFS(作者开发的一个分布式文件系统)上应用了speculation技术。NFS中处理修改操作时,通过将依赖关系通知server(需要修改底层RPC),在Server一端解决了推测假定的判断问题,也算是一种bypass吧。


当然,speculation execution也不是作者发明的,请看作者对其工作的定位:
"To the best of our knowledge, Speculator is the first support for multi-porcess speculative execution in a commodity operating system and the first use of speculative execution to improve cache coherence and write throughput in distributed file systems."


P.S. 对于分布式文件系统的cache coherence问题,已有的一些策略如下:
1. polling the file server-向服务器确认cache是否有效,NFS和BlueFS是此类;
2. callbacks-由服务器通知客户端,AFS和Coda属于此类;
3. leases-客户端获得排他访问租约,SFS(也用callback)和Echo属于此类。


总之,一篇好文章,推荐大家读读。不同领域的思想嫁接,很棒!不过,我还没有想清楚除了DFS,还有什么应用可以用speculative execution。

0 Comments:

Post a Comment

<< Home