In this paper we describe and evaluate an implementation of CPUstyle SIMD ray traversal on the GPU. We show how spreading moderately wide BVHs (up to a branching factor of eight) across multiple threads in a warp can improve performance while not requiring expensive pre-processing. e presented ray-traversal method exhibits improved traversal performance especially for increasingly incoherent rays.