You are viewing libv

LIBV Intentionally Breaks Videodrivers [entries|archive|friends|userinfo]
Luc Verhaegen

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Q3A with open source generated shaders! [Mar. 18th, 2013|08:28 pm]
[Tags|, , , , , , , , , ]
[Current Location |Rumpelkammer]
[mood |Happy]
[music |Cornershop - Brimful of asha (norman cook remix)]

The combination of limare and open-gpu-tools can now run Quake 3 Arena timedemo without depending on the binary driver for the shader compiler!

Connor Abbott has been being his amazing (16y old!) self again in the weeks after his talk at FOSDEM, and he pushed his compiler work in his open-gpu-tools tree to be able to handle basic vertex shaders. Remember that our vertex shader is a rather insane one, where the compiler has to work real hard on getting scheduling absolutely right. This is why an assembler for our vertex shader was not too useful and the most part of a compiler had to be written for it to generate useful results. A mammoth task, and Connor his vertex shader code is now larger than the code I have in my limare library.

So it was high time that we brought limare and OGT together to see what they were capable of with some basic shaders. Luckily, the Q3A GLES1 emulation has basic shaders, what a nice coincidence :)

So Connor turned my simple vertex shader essl into the high level language used by the OGT vertex shader compiler, and through steps described at this wiki page, turned them into MBS files (Mali Binary Shader - the file type output by the standalone compiler, and also by newer binary driver integrated compilers). Limare can then load and parse those MBS files, and run the shaders. No need to involve the ARM binary anymore when we have OGT generated MBS files :)

The result was quite impressive. We had a few issues where the limare driver (which has mostly taken its cues from the output of the binary driver) and OGT disagreed over symbol layout, but apart from that, bringing up the shaders connor produced was pretty painless. Amazingly effortless, for such a big step.

Connor then spent another day playing with the fragment shader assembler, fixed some bugs, and produced 3 fragment shaders for us. One for the clear shader used by limare directly, and 2 for Q3A. After some more symbol layout issues, these also just worked! We even seem to be error-margin faster with the MBS files (due to texture coordinate varyings being laid out differently).

So this is a really big milestone for the lima driver project. Even with our insane pre-optimized architecture, we now are able to run Quake 3 Arena without any external dependencies, and we are beating the ARM binary while doing so.

For generating your own shader MBS files, check out Connors OGT, and then you can head straight to Connors wiki page. My Q3A tree now has the MBS code included directly. And i pushed a dirty version of my FOSDEM limare code.

As for this new limare code, this fosdem_2013_pile branch will vanish soon, as i need to properly pry things apart still. This is run-for-the-price code, and often includes many unrelated fixes in the same commit. It's better to do archeology on it now, than 3y from now, so this needs to be split. But in the meantime, you all can go and give Q3A on a fully free driver stack on Mali hw a go :)

I will not post a video, as there really is nothing new to see. It is the exact same timedemo, running some promille faster. Build things, and then run it yourself on your sunxi hardware (i am still working on porting it to the new kernel of a more powerful platform). That's the best proof there is!

For building limare, check out the fosdem2013_pile branch and then just run make/make install.

For building Q3A all you need to do is run:
make ARCH=arm USE_LIMARE=1
And, when you have the full quake installed in ~ioquake3/baseq3, you can create a file called ~ioquake3/baseq3/demofour.cfg with the following content:
cg_drawfps 1
timedemo 1
set demodone  "quit"
set demoloop1 "demo four; set nextdemo vstr demodone"
vstr demoloop1
You can then run the ioquake3 binary with "+exec demofour.cfg" added to the command line, and you will have the demo running on top of fully free software!

Now we really have covered all the basics, time to find out how Mesa will play with our plans :)
link19 comments|post comment

Hey ARM! [Feb. 13th, 2013|05:13 am]
[Tags|, , , , , , , , ]
[Current Location |Rumpelkammer]
[mood |Happy]
[music |Pixies - Surfer Rosa - River Euphrates]

Quake 3 Arena code.

I pushed out the Quake 3 Arena code used for demoing limare and for benchmarking.

You can build it on your linux-sunxi with
    make ARCH=arm

or, for the limare version (which will not build without the matching limare driver, which i haven't pushed out yet :))
    make ARCH=arm USE_LIMARE=1

for the GLESv2 version (with the broken lamps due to missing alphatest):
    make ARCH=arm USE_GLES2=1

Get a full Quake 3 Arena version first though, and stick all the paks in ~/ioquake3/baseq3. Add this to demofour.cfg in the same directory:
    cg_drawfps 1
    timedemo 1
    set demodone  "quit"
    set demoloop1 "demo four; set nextdemo vstr demodone"
    vstr demoloop1

To run the timedemo then run the quake binary with +exec demofour.cfg

For your own reverse engineering purposes, to build the GLESv1 version with logging included, edit code/egl/egl_glimp.c, and remove the // before:
    //#define QGL_LOG_GL_CALLS 1

But be aware, you are not free to spread that dumped data. That is ID Software data, albeit in a raw form.

I'd be much obliged if anyone hacks up input support, or re-adds sound. Or even adds the missing GLES2 shaders (color as a uniform for instance). That would make this code playable from the console, and should make it easier for me to provide playable limare code.

As you all can see, we have nothing to hide. The relevant differences between GLES1 and limare is in the GL implementation layer. I did shortcut vertex counting, this to ease logging, but this only has limited effect on CPU usage. The lesser CPU usage of limare is not significant or interesting as we do less checking than a full driver anyway. In simpler tests (rotating cubes), our scheduling results in a much higher CPU usage though (like 50% more, from 10% to 15% :)), even if we are not significantly faster. As said in my previous, i am not sure yet whether to keep this, or to find some improvements. Further tests, on much more powerful hardware, will tell.

Connors Compiler Work.

Connor had a massive motivation boost from FOSDEM (and did not suffer from the bug that was going round which so many of us in the last week). Earl from zaReason is sending him an A13 tablet, which should spur him on even further.

He has been coding like a madman, and he is now close to being able to compile the relatively simple shaders used in the Quake 3 Arena demo. He still has to manually convert the ESSL of the vertex shader to his own GP_IR first though, but that's already massive progress which gets us very close to our goals.

I am going to add MBS loading (Mali Binary Shader format) to limare to be able to forgo the binary compiler and load pre-compiled shaders into our programs. Since MBS is also spit out by the open-gpu-tools, we can then distribute our own compiled MBS files directly, and provide a fully open source Q3A implementation on mali.

How cool is that!

The very near future.

With my post purely about Q3A and its numbers, the reactions were rather strange. Seems like a lot of people were hung up exclusively on us being only 2% faster or because we were using this "ancient" game. The blog entry itself explained fully why this ancient game was actually a very good choice, yet only very few read it. Very few realized what a massive milestone an almost pixel-perfect Quake 3 Arena is for a reverse engineered driver.

As for performance... When i started investigating the mali, i had postulated that we would be happy to have only 75% of the performance of the binary driver. I assumed that, even with performance per watt being mighty important in the mobile space, 75% was the threshold at which the advantages of an open source driver would outweight the loss of performance. This would then lead to only ARMs big partners would end up shipping ARMs own binaries. And for projects like CyanogenMod and proper linux distributions there would be no question about what to ship.

With Q3A, and with the various rotating cubes, we now have proven that we can have a 100% match in performance. Sometimes we can even beat performance. All of this is general, no Q3A specific tricks here!

This is absolutely unique, and is beyond even the wildest dreams of a developer of any reverse engineered driver.

Absolutely nothing stops us now from delivering an open source driver that broadly matches the binary driver in performance! And this is exactly what we will be doing next!

Hey ARM!

We are not going away, we are here to stay. We cannot be silenced or stopped anymore, and we are becoming harder and harder to ignore.

It is only a matter of time before we produce an open source graphics driver stack which rivals your binary in performance. And that time is measured in weeks and months now. The requests from your own customers, for support for this open source stack, will only grow louder and louder.

So please, stop fighting us. Embrace us. Work with us. Your customers and shareholders will love you for it.

-- libv.
link10 comments|post comment

Quake 3 Arena timedemo on top of the lima driver! [Feb. 6th, 2013|02:08 am]
[Tags|, , , , , , , ]
[Current Location |Rumpelkammer]
[mood |accomplished]
[music |Cinematic Orchestra - The revolution]

At FOSDEM, I had a mainline talk about "Open ARM GPU Drivers", going over all the projects and GPUs, talking about the developers doing the hard reverse engineering work and the progress that they have made so far. I will write up a blog entry summarizing this talk soon, but for now i will just talk about the Lima demo i showed at the end of the talk.

Let me get straight to the main point before delving into details: We now have a limare (our proto/research driver) port of Quake 3 Arena which is running the q3a timedemo 2% faster than the binary driver. With 3% less cpu overhead than the binary driver to boot!

Here is the timedemo video up on youtube. It is almost pixel-perfect, with just a few rounding errors introduced due to us being forced to use a slightly different vertex shader (ESSL, pulled through the binary compiler instead of a hand coded shader). We have the exact same tearing as the binary drivers, which are also not synced to display on the linux-sunxi kernel (but ever so slightly more tearing than the original ;)).

This Q3A port is not playable for a few reasons. One is, i threw out the touchscreen input support, but never hacked in the standard SDL based input, so we have no input today. It should be easy to add though. Secondly, i only include the shaders that are needed for running the timedemo. The full game (especially its cut scenes) requires a few more shaders, which are even simpler than the ones currently included. I also need to implement the equivalent of glTexSubImage2d, as that is used by the cut-scenes. So, yes, it is not playable today, but it should be easy to change that :)

We are also not fully open source yet, as we are still using the binary shader compiler. Even after begging extensively, Connor was not willing to "waste time" on hand coding the few shaders needed. He has the necessary knowledge to do so though. So who knows, maybe when i push the code out (the q3a tree is a breeze to clean, but the lima code is a mess, again), he might still give us the few shaders that we need, and we might even gain a few promille performance points still :)

I will first be pushing out the q3a code, so that others can use the dumping code from it for their own GPU reverse engineering projects. The limare code is another hackish mess again (but not as bad as last time round), so cleaning that up will take a bit longer than cleaning up q3a.

Why frag like it is 1999?

Until now, i was mostly grabbing, replaying, and then porting, EGL/GLES2 programs that were specifically written for reverse engineering the mali. These were written by an absolute openGL/openGLES newbie, someone called libv. These tests ended up targetting very specific but far too limited things, and had very little in common with real world usage of the GPU. As most of the basic things were known for mali, it was high time to step up things a level.

So what real world OpenGL(ES) application does one pick then?

Quake 3 Arena of course. The demo four timedemo was the perfect next step for reverse engineering our GPU.

This 1999 first person shooter was very kindly open sourced by ID Software in 2005. Oliver McFadden later provided an openGLES1 port of ioquake3 for the Nokia N900. With the Mali binary providing an OpenGLES1 emulation library, it was relatively easy to get a version going which runs on the Mali binary drivers. Thank you Oliver, you will be missed.

The Q3A engine was written for fixed 3D pipelines and this has some very profound consequences. First, it limits the dependency on the shader compiler and allowed me to focus almost purely on the command stream. This completely fits with the main strategy of our reverse engineering project, namely it being 2 almost completely separate projects in one (command stream versus shader compilers). Secondly, and this was a nice surprise when i started looking at captures, the mali OpenGLES1 implementation had some very hardware specific optimizations that one could never expose with OpenGLES2 directly. Q3A ended up being vastly more educational than I had expected it to be.

With Q3A we also have a good benchmark, allowing us to get a better insight into performance for the first time. And on top of all of that, we get a good visual experience and it is a dead-certain crowdpleaser (and it was, thanks for the cheers guys :))

The only downside is that the data needed to run demo four is not available with the q3a demo release and therefor not freely downloadable. Luckily you can still find Q3A CDs on ebay, and i have heard that steam users can easily download it from there.

The long story

After linuxtag, where i demoed the rotating companion cube, I assumed that my knowledge about the mali was advanced enough that bringing up Q3A would take only a given number of weeks. But as these things usually go, and with work an real life getting in the way, it never pans out like that. January 17th is when i had q3a first work correctly, time enough to worry about some optimization still before FOSDEM, but only just enough.

I started with an android device and the kwaak3 "app", which is just Olivers port with some androidiness added. I captured some frames to find out what i still missed with limare. When i finally had some time available, i first spent it cleaning up the linuxtag code, which i pushed out early december. I had already brought up Q3A on linux-sunxi with the mali binary drivers, which can be seen from the video i then published on youtube.

One thing about the youtube video though... Oliver had a tiny error in his code, one that possibly never did show up on the N900. In his version of the texture loading code, the lightmaps original format would end up being RGB whereas the destination format is RGBA. This difference in format, and in-driver conversion, is not supported by the openGLES standard. This made the mali driver refuse to load the texture, which later on had the driver use only the primary texture, even though a second set of texture coordinates were attached to the command stream. The vertex shader did not reflect this, and in my openGL newbieness i assumed that Ben and Connor had a bug in their vertex shader disassembler. You can clearly see the flat walls in the video i posted. Once i fixed the bug though, q3a suddenly looked a lot more appealing.

I then started with turning the openGLES1 support code in Quake's GLimp layer into a dumper of openGLES1 commands and data in a way that made it easy to replay individual frames. Then i chose some interesting frames, and replayed them, turned them into a GLES2 equivalent (which is not always fully possible, alphaFunc comes to mind), and then improved limare until it ran the given frames nicely through (the mali has hw alphaFunc, so limare is able to do this directly too). Rince and repeat, over several interesting frames.

By the evening of January the 16th, i felt that i knew enough to attempt to write a GLimp for limare. This is exactly when my father decided to give me a call. Some have met him at Le Paon last Friday, when he, to my surprise, joined us for a beer after work as his office is not far away. He remarked that i seemed "a bit on edge" when he called on the 16th. Yes, i indeed was, and how could i be anything else at a time like this :) I hacked all night, as at the time i was living purely at night anyway, and minutes before my girlfriend woke up i gave it my first shot. Crash, a stupid bug in my code. I told my girlfriend that i wouldn't join her for "breakfast" before i went to bed, as i was simply way too close. By the time she left for work, i was able to run until the first few in-game frames, when the rendering would hang, with the mali only coming back several seconds later. After a bit of trying around, i gave the GP (vertex shader) a bit more space for its tile heap. This time it ran for about 800 frames before the same thing happened. I doubled the tile-heap again, and it ran all the way through!

The evening before i had hoped that i would get about 20fps out of this hardware. This already was a pretty cocky and arrogant guess, as the binary driver ran this demo at about 47.3fps, but i felt confident that the hardware had little to hide. And then the demo ran through, and produced a number.


Way beyond my wildest dreams. Almost 65% of the performance of the binary driver. Un-be-liev-ab-le. And this was with plain sequential job handling. Start a GP job, wait for it to finish, then start the PP job, wait for it to finish, then flip. 30.5fps still! Madness!

I had two weeks left for FOSDEM, so i had a choice, either add input support and invite someone from the public to come and play before the audience, or, optimize until we beat the binary driver. The framerate of the first pass decided that, optimization it was. I had a good benchmark, and only a third of the performance needed to be found, and most of the corners for that extra performance were known.

My first optimization was to tackle the PP polygon list block access pattern. During my previous talk at FOSDEM, i explained that this was the only bit I found that might be IP encumbered. In the meantime, over the weekly beers with Michael Matz, the SuSE Labs toolchain lead, i had learned that there is thing called the "hilbert space filling curve". Thanks Matz, that was worth about ~2.2fps. I benchmarked another few patterns: two level hilbert (inside plb block, and out), and the non-rotated hilbert pattern that is used for the textures. None would give us the same performance as the hilbert curve.

Building with -O3 then gave us another 1.5fps. Passing vec2s between the shaders gave us 0.3fps. It was time to put in proper interleaved job handling. With the help of Marcus Meissner (the SuSE Security lead), an ioctl struct sizing issue was found for the job wait thread. This fixed the reliability issues with threading on the r3p0 kernel of linux-sunxi. (ARM! Stable kernel interfaces now!) But thanks Marcus, as proper threading and interleaved job handling put me at 40.7 fps!

And then i got stuck. I only had 40.7fps and knew nothing that could account for such a big gap in performance. I tried a few things left and right, but nothing... I then decided to port q3a to GLES2 (with the loss of alphafunc and buggered up lamps as a result) to see whether our issue was with the compiled versus hand-coded shader. But I quickly ran into an issue with multi-texture program state tracking, which was curious, as the lima code was logically the same. Once this was fixed the GLES2 port ran at about 47.6fps, slightly faster than GLES1, which i think might be because of the lack of alphafunc.

Immediately after that i ported the multi-texture state tracking fix to the limare GLimp, but i sadly got no change in framerate out of it. Strangely, it seemed like there was no multitexturing going as my debugging printfs were not being triggered. I then noticed the flag for telling Q3A that our GL implementation supports multitexturing. Bang. 46.7fps. I simply couldn't believe how stupid that was. If that had been correct on the first run, i would've hit above 75% of the framerate, how insane would that have been :)

For the final 1.5fps, which put us at 48.2fps, i added a third frame, this while only rendering out to two framebuffers. Job done!

Adding a fourth frame did not improve numbers, and i left some minute cpu usage and memory usage optimizations untouched. We are faster than the binary driver, while employing no tricks. We know what we need to know about this chip and there is nothing left to prove with Q3A performance.

The numbers.

The fact that we are slightly faster is actually normal. We do not have to adhere to the OpenGLES standard, we can do without a lot of the checking that a proper driver normally needs to do. This is why the goal was not to match the binary driver's performance, but to beat it, which is exactly what we achieved. From some less PP and CPU bound programs, like the spinning cubes, it does seem that we are more aggressive with scheduling though.

Now let's look at some numbers. Here is the end of the timedemo log for the binary driver, on an Allwinner A10 (single cortex a8, at 1GHz), with a Mali-400MP1 at 320MHz, rendering to a 1024x600 LCD, with framerate printing enabled:
    THEINDIGO^7 hit the fraglimit.
    marty^7 was melted by THEINDIGO^7's plasmagun
    1260 frames 27.3 seconds 46.2 fps 10.0/21.6/50.0/5.6 ms
    ----- CL_Shutdown -----
    RE_Shutdown( 1 )

And here is the end of the timedemo log for the limare port:
    THEINDIGO^7 hit the fraglimit.
    marty^7 was melted by THEINDIGO^7's plasmagun
    ]64f in 1.313632s: 48.719887 fps (1280 at 39.473158 fps)
    1260 frames 26.7 seconds 47.2 fps 9.0/21.2/74.0/5.6 ms
    ----- CL_Shutdown -----
    RE_Shutdown( 1 )
    ]Max frame memory used: 2731/4096kB
    Auxiliary memory used: 13846/16384kB
    Total jobs time: 32.723190 seconds
       GP job  time: 2.075425 seconds
       PP job  time: 39.921429 seconds

Looking at the numbers from the limare driver, my two render threads are seriously overcommitted on the fragment shader (PP). We really are fully fragment shader bound, which is not surprising, as we only have a single fragment shader. Our GP is sitting idle most of the time.

It does seem promising for a quad core mali though. I will now get myself a quad-core A9 SoC, and put that one through its paces. My feeling is that there we will either hit a wall with memory bandwidth or with the CPU, as q3a is single threaded. Since limare does not yet support multiple fragment shaders the last remaining big unknown will get solved too.

Another interesting number is the maximum frame time. 50.0ms for the binary driver, versus 74.0ms for limare. My theory there is that i am scheduling differently than the original driver and that we get hit by us overcommitting the fragment shader. Wait and see whether this difference in scheduling will improve or worsen the numbers on the potentially 4 times faster SoC. We will not be context switching anymore with our render threads, and we will no longer be limited by the fragment shader. This should then decide whether another scheme should be picked or not.

Once we fix up the Allwinner A10 display engine, and can reliably sync to refresh rate, this difference in job scheduling should become mostly irrelevant.

The star: the mali by falanx.

In the previous section i was mostly talking about the strategy of scheduling GP and PP jobs, of which one tends to have 1 of each per frame. Performance optimization is a very high level problem on the mali, which is a luxury. On mali we do not need to bother with highly specific command queue patterns which most optimally use the available resources, which then ends up being SoC and board specific. We are as fast as the original driver without any trickery, and this has absolutely nothing to do with my supposed ability as a hacker. The credit fully goes to the design of the mali. There is simply no random madness with the mali. This chip makes sense.

The mali is the correct mix of the sane and the insane. All the real optimization is baked into the hardware design. The vertex shader is that insane for a reason. There is none of that "We can fix it in software" bullshit going on. The mali just is this fast. And after 20 months of throwing things at the mali, i still have not succeeded in getting the mali to hard or soft lockup the machine. Absolutely amazing.

When i was pretty much the only open source graphics developer who was pushing display support and modesetting forwards, I often had to hear that modesetting was easy, and that 3d is insane. The mali proves this absolutely wrong. Modesetting is a very complex problem to solve correctly, with an almost endless set of combinations that requires very good insight and the ability to properly structure things. If you fail to structure correctly, you have absolutely no chance of satisfying 99.9% of your users, you'll be lucky if you satisfy 60%. Compared to modesetting, 3D is clearly delineated, and it is a vastly more overseeable and managable problem... Provided that your hardware is sane.

The end of the 90s was an absolute bloodbath for graphics hardware vendors with just a few, suddenly big, companies surviving. That's exactly when a few Norwegian demo-sceners, at the Trondheim University, decided that they would do 3D vastly better than those survivors and they formed a company to do so, called Falanx. It must've seemed like suicide, and I am very certain that pretty much everybody declared them absolutely insane (like engadget did). Now, 12 years later, seeing what came out of that, I must say that I have to agree. Falanx was indeed insane, but it was that special kind of insanity that we call pure genius.

You crazy demo-sceners. You rock, and let this Q3a port be my salute to you.
link13 comments|post comment

FOSDEM Aftermath. [Feb. 10th, 2012|01:56 pm]
[Tags|, , , , , , , ]
[Current Location |desk]
[mood |accomplished]
[music |QOTSA - QOTSA - Regular John]

FOSDEM was awesome this year. We had an overbooked schedule for our DevRoom, we inaugurated the beautiful and fantastic K building, and i got to present the lima driver.

First off, i would like to thank the FOSDEM organizers and the ULB. The already unique event that is FOSDEM just keeps getting better and better. Pascal & friends: congratulations, like every year, you've outdone yourselves.

Secondly, i would like to thank all the speakers in my devroom. It is clear by now why the first-come-first-serve algorithm has to be used, and it is also clear that it is working. But thank you all for making this a successful event (even Chris, who couldn't make it due to a train derailment). I hope you guys had a lot of fun too, both during your talk and with the rest of FOSDEM.

Lastly, to all those who attended my talk (and those who couldn't get in anymore as well): Thank you all for your very positive feedback. No matter what happens with lima in future, this talk will be the most memorable moment. (oh, and a big thanks to Will Stephenson, from SuSE and KDE, for getting a webcam up that quickly). To whoever shouted something along the lines of "we don't see that, it looks like a perfect cube to us" when the caching went off in the rotating cube hack: this is the open source spirit in its most tangible form. Thank you very much.

To end this post, let me plug the lima website again. We also have a mailinglist and the #lima channel on freenode. The limare code has been available since yesterday night. Heise and lwn posted the story already, and the videos from the FOSDEM talk should soon hit phoronix as well.
linkpost comment

XDC 2012: Nuremberg! [Dec. 9th, 2011|05:32 am]
[Tags|, , , , ]
[Current Location |work room, desk]
[mood |drunkdrunk]
[music |Downtown rush echoing...]

Yes! The board has decided! XDC comes to Nuremberg!

For 2012 we (Egbert Eich, Professor Hopf, and I) will be hosting the annual X conference in Nuremberg!

Egbert will try to get the main SuSE conference room, or, failing that, Matthias will try to get us a university aula, so the venue itself will work itself out beautifully in one way or another. Then... Nuremberg is one of those places which is perfect for large crowds who need food and some liquids in the evening (frankonian/bavarian beergarten culture), so it is the perfect (and highly affordable) conference area from that point of view. And, the best part, even though Nuremberg is not the international hub that Frankfurt is, or the european hub that Munich is, it is halfway between the two, and travel is relatively easy from either of those points, either you take the plane, or you take a much more comfortable train from either airport, and get to Nuremberg in pretty much the same time. You can really make a big save comparing those two airports when flying inside european aerospace, and this for no time difference. One insider tip though: you get to ride the ICE at full speed (300+km/h!) when traveling from Munich (you do have to endure the rather pedestrian S-bahn for 45 minutes though).

Anyway, the main action item now is that Egbert can start to poke SuSE to see when their main conference room is available for 3 days in september 2012 (working network and enough power sockets are a given then!). I doubt that we will get an answer still in the three remaining weeks of this year.

The actual proposal e-mail sent to the board is sadly only available to foundation members, but a wiki page will soon be created which recreates most of that information. But rest assured, we will get close to the wonderful experiences of XDC Toulouse (thank you Matthieu!) and XDC chicago (thank you Michael!) indeed!

(oh, and btw, we have a FOSDEM DevRoom this year, which is rapidly getting its schedule filled! If you are coming, get your talk in right now: first come, first serve!)
linkpost comment

Wheelbuilding... [Aug. 13th, 2011|12:46 am]
[Current Location |rumpelkammer]
[mood |accomplished]
[music |Portishead - Glorybox]

(Bicycle-) Wheel building is an art. An art perfectly suited for a geek; it requires technical insight, knowledge, feeling and some experience. For those interested, here are some tips and pointers from my own experience.

1) Buy "Professional guide to Wheel Building" by Roger Musson, it is going to be the best 9GBP you have spent in a long long while. [HINT1]
2) Read it, twice!
3) Buy the rims and hubs before you buy the spokes (and get the necessary tools too if you haven't already).
4) Measure the ERD (Effective Rim Diameter) using the old-spokes with glued-on nipple method that Roger describes [HINT2].
5) Buy the spokes that spocalc.xls then calculates for you.
5) Lace your wheel like Roger describes, to the letter.
6) Tension your wheel like Roger describes, to the letter [HINT3].

HINT1: Do not read any other sources, Gerd Schraner's book is just pure nostalgia and does not help you much. Especially his explanation for tensioning your spokes should be ignored: while it might get you a straight wheel, your spokes might have wildly varying tension, and are therefor likely to either break due to fatigue or have the wheel go out of true quickly.

HINT2: For creating the cut-off spokes for measuring the ERD as Musson describes; screw your nipples onto your spokes so that your spoke only _just_ comes out of the nipple into the groove for the nipple-driver. This is the measuring length you should use. If you use the absolute top of the nipple for measuring the length, then you will have no room for error, and you will very likely use up all of the thread on the spoke while bringing the wheel up to full tension (this is the experience bit right here). If it is still inside the nipple, then you most likely will end up with too short a spokes, with thread still showing, this too is a nightmare for wheel-building (your nipple-driver will not disengage). Once you bring your wheel up to its final tension, the spoke (especially double butted spokes) will come slightly further out of the nipple as with the measurement-spokes.

HINT3: For the final stage of tensioning, where the spokes tend to turn with the spoke-key, I marked the rim-sides of the spokes with different colour alcohol markers. This gave me the ability to view the turning of the spokes, and to undo it, close to the rim and nipple, without hampering the spoke-key. Since this is an alcohol based marker on stainless steel, you can rub it off afterwards, or you could just take some alcohol to wipe it off. I just kept it on now, knowing full well that most of it will disappear soon enough in the rain and mud.

I am using Extreme Airline 3s, which i got from Rose. These are rather deep rims that are very stable and sturdy, and they have a wear-indicator still. The joint is not done well, and you will always have a third or so of a mm difference in diameter there, but for trekking or mtb tires, this is no issue, it is just annoying when working on the wheels in the stand. Because these rims are so sturdy, the Schraner method becomes quite unreliable, you can much more easily get away with differently tensioned spokes, as the rim is much more likely to even differences out for you instead of showing where the differences are. You actually need to pluck the spokes instead, like Musson describes, early on in the tensioning process, to get rid of the differences in tone and therefor tension.

I ordered a pre-built set of Airline 3s (28" with LX hub and 3N72 dynamo) from Rose more than a year ago, and they seem quite sturdy and have served me well so far. But, sadly, these pre-built wheels were not up to tension, which I could hear on steep climbs as the spokes were rubbing against eachother with heavy and changing load. They were subsequently very hard to tension further, my guess is because of badly oiled nipples before assembly.

Recently I built a first set according to the Schraner method, and while this went well, and the wheels feel good, I am not sure how good they really are as I haven't used them yet. It could be that they go out of true quickly, especially after pumping quite a bit of heat in them going down some slope in the Fränkische Schweiss. The spoke lengths I used for the 28" Extreme Airline 3 rims, triple crossed (of course!) and with 12mm nipples are 276mm for the front, 281/283 for the back.

For the 26" version of the same rim, with the same hubs, same lacing, same nipples, I used 246mm for the front, and 251/253 for the back. This was calculated with spocalc after measuring the rim, according to Musson, and the ERD is 523mm (after correcting for my mistake). These wheels are for a velotraum cross crmo frame that I am just now building up, so there are no kms on them either, but I have a very very good feeling about them, as I did use Mussons book for them, and the wheels came together as good or even better than described. So while my own handiwork is still untested in real-life conditions, at least I can tell the difference between Schraner and Musson, and the spoke lengths are (now) correct too ;)

In any case, if you are into cycling, have done all other jobs around the bicycle, and mastered them, already, then try your hand at wheel-building to complete the skill-set. It is not black magic, it is actually highly logical, but you should not use sentences like "How hard can it be?" or "Right, I'll get my hammer!" when doing so. Read the right book, get the right tools and the correct spokes, and then take your time; it is really very rewarding.
link2 comments|post comment

This way, the free software desktop is never going to make it. [Jan. 15th, 2011|04:52 pm]
[Tags|, , , , , ]
[Current Location |Couch]
[mood |pissed offpissed off]

In order to get easier access to Nokia things, and to boost security (as in, encrypt stuff, for a change), I've been reinstalling my trusted hp 6715b. Most nokians use ubuntu, so i went for 10.4LTS. I already severely disliked the way in which you have no installation options to chose from. You get the grandmother version every time, no "i have a clue, let me decide what i want to do, myself" button anywhere.

I was lucky, in 10.4 my now 3y old graphics card was still working out of the box. But, of course, i want to have my big virtual screen back. This, of course got dropped with randr 1.2 and the Virtual keyword was reused for something else. Matthias Hopf then re-added it in 1.3; mostly to appease me, and the handful of other weirdos out there. But, try finding this option in the xorg.conf manpage. Nothing! Try googling for it, and the first 50 hits either only explain the commandline version or the old style Virtual (which got broken). Apparently, you need to add 'Option "Panning" "${H}x${V}".

Easy, pico /etc/X11/xorg.. Damn. Nothing. head /var/log/Xorg.0.log says xorg.conf.d. Type man xorg.conf.d. Damn. Nothing: "No manual entry for xorg.conf.d" Suuuper. Apparently people are supposed to _know_ that this is part of the xorg.conf manpage.

So, create a new screen, device and monitor section in 01-screen in xorg.conf.d, and press ctrl-alt-backspace, like any experienced driver developer is used to. Damn. Nothing. Head into gnome preferences stuff, enable key combination. Try again. Drop into the console. Wait for the display manager to try again. And wait. And wait. Damn. Nothing again. Ok, the DM might have died, and i don't trust this new gnome stuff, so it might be better to reboot. So Ctrl-alt-del, which worked first time round. At least something one can depend on.

Next time i look back, ubuntu is showing its plymouth style loading, but the panel is gradually turning white. Something is not driving the panel and the driver died, for whatever reason. WTF? Try some key combinations to get a console. Damn, nothing! Pinging the box still worked, but of course, no sshd was installed. Attempting a reboot didn't bring anything either, it just runs into the exact same issue. Nothing is checking whether a previous boot got one to a working console or a working X.

So, insert the ubuntu installation cd, choose live system, mount the fs, chroot to it, apt-get install ssh, and less /var/log/Xorg.0.log to reveal:

> (==) Using config directory: "/etc/X11/xorg.conf.d"
> Parse error on line 3 of section Monitor in file /usr/lib/X11/xorg.conf.d/01-screen.conf
> The Option keyword requires 1 or 2 quoted strings to follow it.
> Parse error on line 3 of section Monitor in file /usr/lib/X11/xorg.conf.d/01-screen.conf
> "2560x1920" is not a valid keyword in this section.
> (EE) Problem parsing the config file
> (EE) Error parsing the config file
> Fatal server error:
> no screens found

I forgot to put apostrophes around "Panning", and i got greeted with a bleeding panel, with no option to easily get around it. What on earth are we thinking here?

This is Ubuntu LTS, with radeon, KMS, plymouth and xorg.conf.d. 5 nails in the free software desktops coffin.
link83 comments|post comment

OpenSuSE 11.1 and a recent scanner [Jan. 5th, 2011|03:17 pm]
[Tags|, , , ]
[Current Location |couch... err... work... err. something.]
[mood |accomplished]
[music |Wax Tailor - Que sera]

As some might know, I am switching ("intermediate") employers, and i am going to do home-office from now on. Home-office probably has tons of advantages, but one disadvantage is that you need to own your own office hardware, like a printer and a scanner. Such beasts were sitting around in Belgium, but in the 3.5 years that i have been in Nuernberg, i have either depended on the office i was respectively working for, or i ran to the copyshop around the corner. The latter is extremely unpractical and becomes rather expensive.

So, today, a Canon Canoscan LiDE 110 arrived from amazon (plus a basic samsung laser), and i have just succeeded in getting it to work with openSuSE 11.1, albeit in a very unscientific way. Here is how.

Sane is divided in front and backends. openSuSE 11.1 requires just an updated backend.

For the LiDE 110, only very recent sane (git from halfway december 2010) supports the LiDE 110 and 210, so grab the git repo.

Then, grab a recent openSuSE sane-backends package, for instance from here. Get yourself the src.rpm, and install it.

A crude way of getting something our specfile can work with is to tar -jc up the sane-backends git repo, and to move that to /usr/src/packages/SOURCES

Then edit the specfile, make sure to bump the "Version" and/or "Release" directives. Then have the "Source0" directive is pointing to the correct tarball, and make sure that the line with "%setup" is pointing to the right %{name}-... directory.

If you are as lucky as i was today, the existing patches, which are mostly about integration, will apply rather cleanly, and rpmbuild will succeed.

Install the created files (you probably won't need -devel), and you should now be able configure your scanner using yast. If yast complains about hal, then run rchal restart.

Now scanimage -L should be happy, and then you're all set.

Happy scanning!
linkpost comment

The linux desktop is dead! [Sep. 17th, 2010|03:41 pm]
[Tags|, , , , , , , , , , , ]
[Current Location |France, Toulouse, XDS2010]
[mood |blahblah]

Or so it will be, soon, if these guys get their way.

Apparently, and this has been the hot new idea for the last year or two; for Xserver 1.10 people want to get rid of one of the greatest things that XFree86 brought us, and one of the better changes that happened after the fork: modular graphics drivers.

While the current proposal is simply to undo the modularization work of the mid-naughties (thanks jezza!), it immediately sparked the imagination of others to go even further (to which Alanc answered rather strikingly). But merging drivers back is in itself already a very damaging move.

So what is the goal behind merging drivers?

The official reason for this is "cleaning up the API", but I fail to see any logical link between being able to clean up APIs and mashing everything together.

There is simply nothing that stops APIs from being improved when drivers are not a full and whole part of the xserver build tree.

A mashed-together tree has no more advantage than a buildsystem like our tinderbox.

And having modular drivers does not mean that one has to have a fully static API and ABI, you just need to have dependable ABI bumping, and, for the sake of overhead, sane and forward-looking API changes. Free software drivers are of course best able to keep in sync with API changes, but this is no different whether they are external or internal to the server build tree.

However, there is a difference in how one approaches API cleanups in a modular world, as one needs to think a bit more about how to do such API changes. This often leads to a cleaner design, a better structure, and it often means that people spend time trying to understand existing code, and how to best adjust it to fit the new needs, without throwing out the baby with the bathwater. By moving the drivers into the xserver tree, and outlawing the API, we will only open the door for having a libpciaccess type breakage every month.

So maybe this is the real wish behind wanting to merge back drivers: being able to pull crazy stunts with halfarsedly designed, badly structured and untested code, without implications, without accountability.

Apart from APIs degrading further, there are other more fundamental issues with this, with actually far reaching consequences.

When tying in the graphics drivers with the X server, the only way one could get driver updates, to get bugfixes, new features or new hardware support, is by installing a new Xserver.

This is probably going to be claimed as a benefit, as people want more testing of upstream code, but a slight increase in usage of upstream code, will mean a much bigger decrease in userbase on released code, and people will be even more afraid of updating anything in their system than today.

But this is how the kernel does it!

We've all heard this from our mothers: "If some other kid jumps off a cliff, is that a reason to jump off that cliff as well?"

Basically, while it might be a good idea for the often much simpler devices that have rather complete drivers (at least compared to graphics drivers :)) in the kernel to be a full and whole part of the kernel, it does not and will not work well for graphics drivers.

The complexity and the amount of movement in graphics drivers, especially with the many parts staying in userspace and the very unstable interfaces to them, makes this rather messy. And the only way that this is feasible is when those drivers are rather stable, and they definitely need to have a very stable ABI to userspace.

No-one will be able to maintain such a level of stability for graphics drivers, and i am sure that no-one will stand up to defend going that route, if this requirement is mixed into the discussion.

How to sneak in a 1 to 1 version dependency between xserver, mesa and the linux kernel... Pt. 1.

In January this year, in the run-up to xserver 1.8, there was a commit to the xserver, labelled "xserver: require libdri 7.8.0 to build", where an autoconf rule was added to depend on this version of "libdri". I believe that this was mainly because of DRI2 changes.

When I say depend here, there is not a complete dependency on a given version of libdri. One can always build the xserver without any DRI support whatsoever. But who, on the desktop, really wants that today?

So while this all-or-nothing decision is in itself questionable, there is another question to be asked here: what is this libdri?

There is a dri.pc on most systems today, and there is a on most systems today. The former is a package config file coming from the mesa tree, the latter, is an xserver internal convenience library (hence the lack of so versioning). Smells fishy, doesn't it?

Now, while you might want to spend time looking high and low for the libdri from mesa, you will not find it. Mesa comes with 10 or more different libdris, one for each driver it supports, with the whole of the mesa linked in statically, in the form of

Urgh, how broken is that?

So, the xserver now depends on 10 or more different, driver specific, enormous binaries, all because its dri support now depends on a given version of the dri protocol. Or, re-stating that, the xserver depends on a very specific version of the monolithic, 80s style, mesa tree.

Expanding the logic for the xserver and the drivers: why not just mash the mesa and xserver trees together then? :)

More parts come into play... (or dependency Pt. 2)

The xserver depends on the standard drm infrastructure, and this is compatible up to a 4+ year old release of libdrm, namely version 2.3.0, as the basic libdrm code has barely changed since.

Mesa, however, is a different story altogether. It depends, hard, on the latest version of libdrm, and this has been so since Oktober 2008, when intel introduced libdrm_intel in libdrm 2.4.0.

In essence, this libdrm_intel is nothing more than a driver-stack internal convenience library. It only contains code that is specific for intel hardware and the only dependencies are parts of the intel driver stack (if those parts were living separately already). There are no direct dependencies from anything else.

But, ever since Oktober 2008, both the intel x driver and the intel mesa driver depend on the latest libdrm version, and since then, both radeon and nouveau joined in the frenzy.

So, while there might be some backwards compatibility between dri drivers and libdrm drivers, the reality is that intel, radeon and nouveau are today playing hopscotch. Because mesa is monolithic, and at least one of its drivers is going to depend on the latest libdrm version, the whole of monolithic mesa simply depends on the latest libdrm version.

Since mesa has been depending on the latest libdrm for a few years now, and the xserver has been depending on the latest mesa version since the start of 2010, in turn, the xserver now depends on the latest libdrm version.


How does this tie in the kernel? (dependency Pt. 3).

Well, since libdrm has the driver specific sublibraries, those of course call drm driver specific ioctls, and of course, these ioctls change all the time. While some people claim that they try to abstract at this layer (and that this strategy is good enough for everyone...), and claim to try to keep the kernel to userspace interface stable, this of course is only true for a very limited range of kernel and userspace parts. Now, we have intel, radeon _and_ nouveau playing at this level, dividing whatever median compatibility range there is, by three.

The result is that libdrm can pretty much only be backwards compatible to the kernel by accident.

So, continuing our logic from earlier, the latest xserver depends on the latest mesa, the latest libdrm and the latest kernel.

Smashing lads! Well done! And all of this on a set of connections and foundations that make a house of cards look like a block of granite.

The root of the problem.

Graphics hardware is horribly complex. Several years ago, a single graphics card already broke the terraflop boundary, managing what a huge IBM supercomputer only managed a good decade earlier. Single graphics cards come with many hundreds of shader cores, running at frequencies above 1Ghz, have multiple gigabytes of ram, eat 200+ Watts, and can drive up to 6 displays today. There is no other single piece of hardware which is this complex.

And this complexity is of course also there in software.

You cannot count the different parts of a modern graphics driver stack on free software on one hand anymore. There is the kernel drm part, the firmware, the libdrm part, the X driver, a pair of mesa drivers, an xvmc and possibly another media acceleration driver. A graphics driver stack, can be made up of up to 8 parts today.

All of those parts are scattered over the system. There is 2 parts shipped with the kernel, 1 part shipped with libdrm, 2 drivers shipped with mesa, and the remainder can be found in an xf86-video tree.

Naturally, in order to work most optimally, these different parts have a very direct and acute dependency on each other. Bugs, new features and new hardware support usually incur changes to interfaces between those different parts all the time.

The way that those different parts are spread all over the place today make it almost impossible to have an optimal setup. Most of the time one is glad if it works at all. What's more, this spread is the core reason for the de-facto 1-1 version tie between kernel, libdrm, xserver and mesa.

The consequences of a 1-1 version tie between kernel, xserver and mesa.

With graphics hardware and graphics drivers being this complex, there is simply no way to have them in a bugfree or a constant "useful" state.

We just _have_ to live with the fact that graphics drivers will be buggy, and we should try to handle this as gracefully as possible.

This means that we should be able to replace all or parts of the graphics driver stack at any time, without negatively affecting other parts of the system.

This is what our audience, our customers as it were, expect from us.

But, by having kernel, libdrm, xserver and mesa tied together, and the different parts of the driver stack spread over them, it is impossible to exchange 1 part of the graphics driver stack, or to exchange just the graphics driver stack, without changing the whole.

By forcing our users to update all this infrastructure each, we will usually trigger a cascade of updates that reach far up the whole software stack, to the extent where trying to fix some small issue in the graphics driver, might mess up openOffice or another program that your average linux desktop user depends on.

Also, what is the chance of getting both wireless, suspend/resume and your graphics driver working to an acceptable level at the same time? This becomes very very small, and when it does work, you better not run into issues somewhere else, as an update might ruin that very precarious balance.

Killing the desktop for everyone.

No normal person can then run a free software desktop system, and expect to use it, because an arbitrary mix of hardware cannot possibly work together acceptably, at least not for a measurable amount of time.

What will be left over is preloads and embedded system.

Preloads is when some OEM, either itself, or through a linux distributor, spends many many man-years on making all parts work together properly. In the end, images will be produced which install on a very specific system and cannot be updated or maintained, except by a specialised team of people. Embedded systems basically work the same way: one combination of hardware, one image, no updates for average users except those provided by the manufacturer or their partners.

So while people might buy a free software based system in a shop around the corner, and be somewhat happy with it for a while, normal desktop users will be left out in the cold.

Looking further, by shutting out our own users, we will take away the breeding ground that free software is based on.

What solution is there?

By now, that should be pretty obvious.

Bring the different parts of the graphics driver stack together, and make its parts independent of the infrastructure they depend on.

This allows driver developers to change internal structure and API at will, while at the same time providing the infrastructure compatibility that users, hardware and distribution vendors require.

All it takes is a little care in designing infrastructure APIs, and a little care in keeping driver stacks compatible, even if that compatibility comes at the cost of disabling some features for some combinations of the infrastructure.

This is not hard to do, and it is done in multiple places.

Why the Nvidia binary driver is that popular.

In a recent phoronix survey, the amount of users using Nvidia hardware and drivers is larger than the users using any other combination.

This has a reason, and it has nothing to do with Nvidia being a completely closed source shop. Nvidia gives users the ability to install any graphics driver stack, and it should mostly be compatible with the environment it is installed in. This is simply what our users need.

What is affected by Nvidia being binary only, is that Nvidia has to put in a lot of work on making things compatible. Free software drivers have a much much easier task, or at least they would, if they, and the infrastructure they depend on, was developed in a different fashion than is the case today.

An open proof of concept.

My talk at FOSDEM, of course mentions my unichrome driver a lot, as it pretty much is my playground these days.

Even though the featurelist of this driver is very limited, it is now integrating X, DRM and DRI drivers in one massively backwards compatible build-system, with autotools detecting all the API changes across all currently used versions of the necessary infrastructure. What one can see there is that, when some care is taking in structuring the driver, it is not that hard to achieve this: it basically just takes the will to do this.

When I talked at FOSDEM, some people were stating that, while it might be possible for DRM and the Xserver, it would be totally impossible on Mesa/DRI, but for Mesa/gallium it should be easy.

In the next month or so, I took all Mesa versions that were out in the wild, and split off the main libraries from the actual DRI drivers, created a set of headers as required by the drivers, created package config files, and then move the drivers out to their own git repositories. Basically, a DRI SDK was created, and the drivers were now building and running externally to this SDK. This across 3 years of DRI development.

When I took that back to the Mesa community, what I of course got was indifference, and, suddenly, claims that while this SDK might be possible for mesa/DRI it would definitely not be possible for Mesa/gallium!

The future?

The proposed future direction for graphics drivers is to create graphics driver stacks. If not, we, the developers, might just as well stop working on free software graphics drivers altogether.

And while the current situation currently is bad, it is not impossible to fix. The problems are known and clear, a path to the solution should by now also be clear, but the willingness to put in the bit of extra thought is simply lacking.

So guys, if you really want to move into the wrong direction, please state the real reasons for doing so, state the consequences to your users; and know what the end result will be.
link31 comments|post comment

The DRI SDK and modular DRI drivers. [Mar. 17th, 2010|01:31 am]
[Tags|, , , , , , , , , , ]
[Current Location |couch]
[mood |accomplished]
[music |Kinobe - Slip into something more comfortable.]

At FOSDEM I held a talk about "The free software graphics driver stack", analyzing the structure and the distribution of the different parts of the graphics driver stack, and, based on clear requirements, proposing a re-organization to be able to bring the individual parts of the graphics driver stack together. The slides for that talk are available here, and the audio and video of this talk have been made available too (Thanks Michael!).

Since I do not like to talk thin air, i also made a fully unified unichrome graphics driver stack available, proving that it is possible to unite all of Xorg, DRM and DRI drivers in one tree, and that it is possible to provide a reasonable amount of backwards compatibility (trivial for a stagnated driver) even in the Mesa/DRI driver.

My slides have a TODO section, and the most significant of those TODOs was for Mesa. This listed providing shared libraries, pkgconfig files and the necessary headerfiles for Mesa, much like the SDK that X has had ever since xfree86 4.0.0. Of course, such a thing would not happen on its own, so a bit after FOSDEM I set off and now I have implemented just that.

You can find the SDK enabled Mesa DRI tree here, and the individual drivers can also be found in my personal git repositories. A README like document is available too, to explain what one should do to build and install the SDK and to build and install the drivers.

What this gets you, once the SDK is properly installed, is the ability to build and install just the DRI driver to be able to fix bugs, get some performance gains, or just provide the driver developers with good feedback (provided that someone updated the driver tree to the latest compatible changes that is ;)). This without having to replace your libGL, which is where all the dependencies come weighing in. Anyone who has had to update this so far, knows how painful such an update is, and that it is often easier to do a major distribution upgrade.

So this brings us another step closer to make the free software desktop happen: the ability to easily update our graphics driver stack without touching anything else in our system!
link9 comments|post comment

[ viewing | 10 entries back ]
[ go | earlier/later ]