I don't quite understand how Mesa and device drivers play together... can someon...

edwintorok · on June 8, 2014

Because the OpenGL specification is actually something very complex, and doesn't map directly to hardware. There are many things that are higher-level, or that are features of the OpenGL implementation, and not of the hardware. (you'll notice that if you run a newer version of Mesa on older hardware you get some new extensions supported)

So it is beneficial if code can be shared between hardware drivers. You could have a full implementation for each GPU/vendor, each with its own set of bugs, missing features and incompatibilities. Or you can do as Mesa did and try to implement the hardware independent parts in a common place, and have just the hardware-specific parts implemented by the drivers.

For example you definitely don't want a separate implementation of the GLSL compiler for each GPU/vendor, GLSL is compiled to an intermediate representation and that is then further translated to hardware specific instructions. (although in practice I think you still have 2 implementations: one for Intel, and another for all other Gallium drivers)

Think of it this way: would you like a full C compiler written from scratch for each CPU architecture, or would you want to share where possible (parser, type checker, hardware-independent optimizers, etc.).

So an analogy would be: GCC frontend is like core Mesa, and a GCC backend is like a Mesa hardware driver. Another analogy would be libc: the majority of the code is hardware-independent that implements the various C99/POSIX APIs (i.e. like core Mesa), with just the hardware-specific parts / optimizations implemented in assembly (i.e. like Mesa drivers).

zanny · on June 8, 2014

This is the layer of things that make up Linux graphics:

1. Kernel Module Hardware Drivers (radeon, i915, nvidia, etc)

2. DRI to interface with the kernel

3. Mesa DRI Drivers, providing the translation from libgl to DRI / kernel module. Examples are the Intel driver and Gallium architecture, which provides AMD, Nvidia, Tegra, Mali, and Ardreno drivers and share even more code.

4. LibGL, the shared library provided at /usr/lib/libGL.so

Your application links against libGL, which is the implemented OpenGL API.

The distinction between Gallium and Intel can be useful to clarify here - Gallium has an internal API called Winsys that is the minimum command set drivers can support, "state trackers" that provide translation layers for various APIs (OpenGL, OpenCL, Openmax, Direct3D) to that base API.

Mesa libgl will call into the DRI drivers, and if they are Gallium drivers that is through the OpenGL state tracker, where the vendor specific code will get a pass off of shader code and instructions to translate into the winsys layer, which is what the underlying drivers will translate to the command stream for the GPU hardware.

Intel implements that entire part on their own - they have their own separate implementations of openCL (beignet) and openGL to hardware translation.

chubot · on June 8, 2014

You may want to read this (long) article:

http://blog.mecheye.net/2012/06/the-linux-graphics-stack/

Not claiming I understand it all, but it's long because graphics are complicated :) I have been surprised at how device-dependent graphics software is.

I wonder if Windows or OS X has nicer graphics abstractions than Linux? i.e. that make it less of a mess?

mwfunk · on June 8, 2014

I think the graphics stack is a sore point on most or all platforms that have to support multiple GPUs from multiple vendors, at least the stuff in between userland GL calls and the bare metal.

In an ideal world it would be obvious where to draw the line between the GPU- and vendor-independent bits and the common code shared by everyone, but not only is that very hard to do, it can change drastically over the years. Couple that with the fact that platform vendors have to maintain these imperfect interfaces for many years regardless, and that vendors can have their own abstraction layers in between the top layer of their driver code and the bare metal of all their GPUs, and you have a perfect storm of too many layers getting in the way and imperfectly defined interfaces to many of those layers, which necessitates occasional layering violations at every level of the stack if you want to avoid some pathological performance bottlenecks for some GPUs.

At best you end up with a bunch of ugly, imperfect, but generally functional software that more or less gets the job done at an acceptable level of performance.

I would imagine that the situation with Linux just adds yet more issues to that bag of hurt. GPU vendors don't always want to help out, and are likely to withhold valuable technical information that they don't want to be public. The whole process of defining how many layers there are, what functionality exists at what layer, and what interfaces are used between the layers is almost by definition (for GPU driver models at least) a bunch of questions for which there are no good answers, which is also a perfect storm for Open Source development. Any solution anyone comes up with is always going to have some caveats and gotchas, which for open, egalitarian development models means endless arguments and bikeshedding in which everyone involved is correct in pointing out the weaknesses of the system as designed, but there are no alternatives that anyone can point to that don't have similar problems in someone else's eyes.

For example, back in the '90s, for right or wrong I was very optimistic about GGI solving a lot of graphics problems for Linux, and that was limited to the 2D stuff (I can't imagine how controversial it would've been had it included 3D as well):

http://en.wikipedia.org/wiki/General_Graphics_Interface

It didn't pan out that way, but so much energy was burned by so many people arguing about it over so many years. :( I don't have authoritative knowledge of the whole thing, so maybe the doubters were correct, but at the time it felt like a bunch of people shooting down an imperfect solution while offering no solution at all.

EDIT: TL;DR designing a graphics stack that ultimately goes down to the metal is a set of very hard problems, many of which have no perfect answers. This, coupled with secretive vendors and the level of specialized knowledge required to even contribute to stuff like this, makes it an even harder problem in the Open Source world. Not an insurmountable obstacle, but a really hard set of problems nonetheless, for both technical and human reasons.

Danieru · on June 8, 2014

I cannot say how mesa is structured but graphics drivers are not simple for good reasons.

First you don't want every draw call to cause a system trap. Thus some drivers include a dynamic library which gets loaded into the program’s memory space.

Then after the library comes a daemon. Between the library and the daemon is where the OpenGL implementation lives.

The daemon then hands off to the driver. The driver is meant to be a light interface to the GPU. The thinking being the kernel is not the right place to transform OpenGL to hardware instructions. If one did write an OpenGL implementation in a kernel driver then we could expect some harsh words from Linus.

Now on a pedantic note the GPU already handles pushing the bits to the screen without the driver's help. Which is good because you do not want to introduce yet another timing dependent driver class like the networking stack.

I'm not sure where the shader compilation occurs.