Log in

Post a comment - LIBV Intentionally Breaks Videodrivers [entries|archive|friends|userinfo]
Luc Verhaegen

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

The linux desktop is dead! [Sep. 17th, 2010|03:41 pm]
Luc Verhaegen
[Tags|, , , , , , , , , , , ]
[Current Location |France, Toulouse, XDS2010]
[mood |blahblah]

Or so it will be, soon, if these guys get their way.

Apparently, and this has been the hot new idea for the last year or two; for Xserver 1.10 people want to get rid of one of the greatest things that XFree86 brought us, and one of the better changes that happened after the X.org fork: modular graphics drivers.

While the current proposal is simply to undo the modularization work of the mid-naughties (thanks jezza!), it immediately sparked the imagination of others to go even further (to which Alanc answered rather strikingly). But merging drivers back is in itself already a very damaging move.

So what is the goal behind merging drivers?

The official reason for this is "cleaning up the API", but I fail to see any logical link between being able to clean up APIs and mashing everything together.

There is simply nothing that stops APIs from being improved when drivers are not a full and whole part of the xserver build tree.

A mashed-together tree has no more advantage than a buildsystem like our tinderbox.

And having modular drivers does not mean that one has to have a fully static API and ABI, you just need to have dependable ABI bumping, and, for the sake of overhead, sane and forward-looking API changes. Free software drivers are of course best able to keep in sync with API changes, but this is no different whether they are external or internal to the server build tree.

However, there is a difference in how one approaches API cleanups in a modular world, as one needs to think a bit more about how to do such API changes. This often leads to a cleaner design, a better structure, and it often means that people spend time trying to understand existing code, and how to best adjust it to fit the new needs, without throwing out the baby with the bathwater. By moving the drivers into the xserver tree, and outlawing the API, we will only open the door for having a libpciaccess type breakage every month.

So maybe this is the real wish behind wanting to merge back drivers: being able to pull crazy stunts with halfarsedly designed, badly structured and untested code, without implications, without accountability.

Apart from APIs degrading further, there are other more fundamental issues with this, with actually far reaching consequences.

When tying in the graphics drivers with the X server, the only way one could get driver updates, to get bugfixes, new features or new hardware support, is by installing a new Xserver.

This is probably going to be claimed as a benefit, as people want more testing of upstream code, but a slight increase in usage of upstream code, will mean a much bigger decrease in userbase on released code, and people will be even more afraid of updating anything in their system than today.

But this is how the kernel does it!

We've all heard this from our mothers: "If some other kid jumps off a cliff, is that a reason to jump off that cliff as well?"

Basically, while it might be a good idea for the often much simpler devices that have rather complete drivers (at least compared to graphics drivers :)) in the kernel to be a full and whole part of the kernel, it does not and will not work well for graphics drivers.

The complexity and the amount of movement in graphics drivers, especially with the many parts staying in userspace and the very unstable interfaces to them, makes this rather messy. And the only way that this is feasible is when those drivers are rather stable, and they definitely need to have a very stable ABI to userspace.

No-one will be able to maintain such a level of stability for graphics drivers, and i am sure that no-one will stand up to defend going that route, if this requirement is mixed into the discussion.

How to sneak in a 1 to 1 version dependency between xserver, mesa and the linux kernel... Pt. 1.

In January this year, in the run-up to xserver 1.8, there was a commit to the xserver, labelled "xserver: require libdri 7.8.0 to build", where an autoconf rule was added to depend on this version of "libdri". I believe that this was mainly because of DRI2 changes.

When I say depend here, there is not a complete dependency on a given version of libdri. One can always build the xserver without any DRI support whatsoever. But who, on the desktop, really wants that today?

So while this all-or-nothing decision is in itself questionable, there is another question to be asked here: what is this libdri?

There is a dri.pc on most systems today, and there is a libdri.so on most systems today. The former is a package config file coming from the mesa tree, the latter, is an xserver internal convenience library (hence the lack of so versioning). Smells fishy, doesn't it?

Now, while you might want to spend time looking high and low for the libdri from mesa, you will not find it. Mesa comes with 10 or more different libdris, one for each driver it supports, with the whole of the mesa linked in statically, in the form of driver_dri.so...

Urgh, how broken is that?

So, the xserver now depends on 10 or more different, driver specific, enormous binaries, all because its dri support now depends on a given version of the dri protocol. Or, re-stating that, the xserver depends on a very specific version of the monolithic, 80s style, mesa tree.

Expanding the logic for the xserver and the drivers: why not just mash the mesa and xserver trees together then? :)

More parts come into play... (or dependency Pt. 2)

The xserver depends on the standard drm infrastructure, and this is compatible up to a 4+ year old release of libdrm, namely version 2.3.0, as the basic libdrm code has barely changed since.

Mesa, however, is a different story altogether. It depends, hard, on the latest version of libdrm, and this has been so since Oktober 2008, when intel introduced libdrm_intel in libdrm 2.4.0.

In essence, this libdrm_intel is nothing more than a driver-stack internal convenience library. It only contains code that is specific for intel hardware and the only dependencies are parts of the intel driver stack (if those parts were living separately already). There are no direct dependencies from anything else.

But, ever since Oktober 2008, both the intel x driver and the intel mesa driver depend on the latest libdrm version, and since then, both radeon and nouveau joined in the frenzy.

So, while there might be some backwards compatibility between dri drivers and libdrm drivers, the reality is that intel, radeon and nouveau are today playing hopscotch. Because mesa is monolithic, and at least one of its drivers is going to depend on the latest libdrm version, the whole of monolithic mesa simply depends on the latest libdrm version.

Since mesa has been depending on the latest libdrm for a few years now, and the xserver has been depending on the latest mesa version since the start of 2010, in turn, the xserver now depends on the latest libdrm version.


How does this tie in the kernel? (dependency Pt. 3).

Well, since libdrm has the driver specific sublibraries, those of course call drm driver specific ioctls, and of course, these ioctls change all the time. While some people claim that they try to abstract at this layer (and that this strategy is good enough for everyone...), and claim to try to keep the kernel to userspace interface stable, this of course is only true for a very limited range of kernel and userspace parts. Now, we have intel, radeon _and_ nouveau playing at this level, dividing whatever median compatibility range there is, by three.

The result is that libdrm can pretty much only be backwards compatible to the kernel by accident.

So, continuing our logic from earlier, the latest xserver depends on the latest mesa, the latest libdrm and the latest kernel.

Smashing lads! Well done! And all of this on a set of connections and foundations that make a house of cards look like a block of granite.

The root of the problem.

Graphics hardware is horribly complex. Several years ago, a single graphics card already broke the terraflop boundary, managing what a huge IBM supercomputer only managed a good decade earlier. Single graphics cards come with many hundreds of shader cores, running at frequencies above 1Ghz, have multiple gigabytes of ram, eat 200+ Watts, and can drive up to 6 displays today. There is no other single piece of hardware which is this complex.

And this complexity is of course also there in software.

You cannot count the different parts of a modern graphics driver stack on free software on one hand anymore. There is the kernel drm part, the firmware, the libdrm part, the X driver, a pair of mesa drivers, an xvmc and possibly another media acceleration driver. A graphics driver stack, can be made up of up to 8 parts today.

All of those parts are scattered over the system. There is 2 parts shipped with the kernel, 1 part shipped with libdrm, 2 drivers shipped with mesa, and the remainder can be found in an xf86-video tree.

Naturally, in order to work most optimally, these different parts have a very direct and acute dependency on each other. Bugs, new features and new hardware support usually incur changes to interfaces between those different parts all the time.

The way that those different parts are spread all over the place today make it almost impossible to have an optimal setup. Most of the time one is glad if it works at all. What's more, this spread is the core reason for the de-facto 1-1 version tie between kernel, libdrm, xserver and mesa.

The consequences of a 1-1 version tie between kernel, xserver and mesa.

With graphics hardware and graphics drivers being this complex, there is simply no way to have them in a bugfree or a constant "useful" state.

We just _have_ to live with the fact that graphics drivers will be buggy, and we should try to handle this as gracefully as possible.

This means that we should be able to replace all or parts of the graphics driver stack at any time, without negatively affecting other parts of the system.

This is what our audience, our customers as it were, expect from us.

But, by having kernel, libdrm, xserver and mesa tied together, and the different parts of the driver stack spread over them, it is impossible to exchange 1 part of the graphics driver stack, or to exchange just the graphics driver stack, without changing the whole.

By forcing our users to update all this infrastructure each, we will usually trigger a cascade of updates that reach far up the whole software stack, to the extent where trying to fix some small issue in the graphics driver, might mess up openOffice or another program that your average linux desktop user depends on.

Also, what is the chance of getting both wireless, suspend/resume and your graphics driver working to an acceptable level at the same time? This becomes very very small, and when it does work, you better not run into issues somewhere else, as an update might ruin that very precarious balance.

Killing the desktop for everyone.

No normal person can then run a free software desktop system, and expect to use it, because an arbitrary mix of hardware cannot possibly work together acceptably, at least not for a measurable amount of time.

What will be left over is preloads and embedded system.

Preloads is when some OEM, either itself, or through a linux distributor, spends many many man-years on making all parts work together properly. In the end, images will be produced which install on a very specific system and cannot be updated or maintained, except by a specialised team of people. Embedded systems basically work the same way: one combination of hardware, one image, no updates for average users except those provided by the manufacturer or their partners.

So while people might buy a free software based system in a shop around the corner, and be somewhat happy with it for a while, normal desktop users will be left out in the cold.

Looking further, by shutting out our own users, we will take away the breeding ground that free software is based on.

What solution is there?

By now, that should be pretty obvious.

Bring the different parts of the graphics driver stack together, and make its parts independent of the infrastructure they depend on.

This allows driver developers to change internal structure and API at will, while at the same time providing the infrastructure compatibility that users, hardware and distribution vendors require.

All it takes is a little care in designing infrastructure APIs, and a little care in keeping driver stacks compatible, even if that compatibility comes at the cost of disabling some features for some combinations of the infrastructure.

This is not hard to do, and it is done in multiple places.

Why the Nvidia binary driver is that popular.

In a recent phoronix survey, the amount of users using Nvidia hardware and drivers is larger than the users using any other combination.

This has a reason, and it has nothing to do with Nvidia being a completely closed source shop. Nvidia gives users the ability to install any graphics driver stack, and it should mostly be compatible with the environment it is installed in. This is simply what our users need.

What is affected by Nvidia being binary only, is that Nvidia has to put in a lot of work on making things compatible. Free software drivers have a much much easier task, or at least they would, if they, and the infrastructure they depend on, was developed in a different fashion than is the case today.

An open proof of concept.

My talk at FOSDEM, of course mentions my unichrome driver a lot, as it pretty much is my playground these days.

Even though the featurelist of this driver is very limited, it is now integrating X, DRM and DRI drivers in one massively backwards compatible build-system, with autotools detecting all the API changes across all currently used versions of the necessary infrastructure. What one can see there is that, when some care is taking in structuring the driver, it is not that hard to achieve this: it basically just takes the will to do this.

When I talked at FOSDEM, some people were stating that, while it might be possible for DRM and the Xserver, it would be totally impossible on Mesa/DRI, but for Mesa/gallium it should be easy.

In the next month or so, I took all Mesa versions that were out in the wild, and split off the main libraries from the actual DRI drivers, created a set of headers as required by the drivers, created package config files, and then move the drivers out to their own git repositories. Basically, a DRI SDK was created, and the drivers were now building and running externally to this SDK. This across 3 years of DRI development.

When I took that back to the Mesa community, what I of course got was indifference, and, suddenly, claims that while this SDK might be possible for mesa/DRI it would definitely not be possible for Mesa/gallium!

The future?

The proposed future direction for graphics drivers is to create graphics driver stacks. If not, we, the developers, might just as well stop working on free software graphics drivers altogether.

And while the current situation currently is bad, it is not impossible to fix. The problems are known and clear, a path to the solution should by now also be clear, but the willingness to put in the bit of extra thought is simply lacking.

So guys, if you really want to move into the wrong direction, please state the real reasons for doing so, state the consequences to your users; and know what the end result will be.

post comment:

No HTML allowed in subject


(will be screened)