I was using the PGI compiler. I'm not aware of any open source implementation that can automatically take advantage of cuda gpu back when I was still working with fortran-related projects. Not sure about now.
Allegedly GCC support GPU (ptx in particular, for NVIDIA) offloading. I don't know whether the performance is competitive and whether it can be used to speed-up fortran co-arrays (which I, as a non-fortran programmer, would expect to be the way that functionality would be made available to fortran). OpenMP should work as well.