Log in

LIBV Intentionally Breaks Videodrivers [entries|archive|friends|userinfo]
Luc Verhaegen

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

The value of writing code and instigating change. [Apr. 23rd, 2015|04:30 pm]
Luc Verhaegen
[Tags|, , , , , , , , , ]
[Current Location |Rumpelkammer.]
[mood |draineddrained]
[music |Groovesalad]

You probably saw this phoronix article which references the log of the #dri-devel channel on freenode. This was an attempt to trash my work on lima and tamil, using my inability to get much in the way of code done, and my unwillingness to throw hackish tamil code over the hedge, against me. Let me take some time to explain some of that from my point of view.

Yes, I have lost traction.

Yes, i am not too motivated to work on cleaning up code at this time. I haven't been too motivated for anything much since FOSDEM 2014. I still have my sunxi kms code to fix up/clean up, and the same is true for the lima driver for mesa. It is also pretty telling that i started to write this blog entry more than two months ago and only now managed to post it.

Under the circumstances, though, everyone else would've given up about 2 to 3 years ago.

Due to my usual combination of stubbornness and actually sitting down and doing the work, I have personally kickstarted the whole open ARM GPU movement. This is the third enormous paradigm shift in Linux graphics that i have caused these past almost 12 years (after modesetting, and pushing ATI to a point of no return with an open source driver). All open ARM GPU projects were at least inspired by this, some actually use some lima code, others would simply not have happened if I hadn't done Lima and/or had kept my mouth shut at important junctions. This was not without sacrifices. Quite the opposite.

From March 2011 on, I have spent an insane amount of time on this. Codethink paid, all-in-all, 6 weeks of my time when i was between customer projects. Some of the very early work was done on Nokia time, as, thanks to our good friend Stephen Elop, operations were winding down severely in late 2011 to the point where mostly constant availability was needed. However, I could've used that extra free time in many other useful ways. When I got a job offer at SuSE in november 2011 (basically getting my old job back due to Matthias Hopf taking up a professorship), I turned it down so I could work on the highly important Lima work instead.

When Codethink ran into a cashflow issue in Oktober 2012 (which apparently was resolved quite successfully, as codethink has grown a lot more since then), I was shown the door. This wasn't too unreasonable a decision given my increasing disappointment with how lima was being treated, the customer projects i was being sent on, and the assumption that a high profile developer like me would have an easy time getting another employer. During the phonecall announcing my dismissal, I did however get told that ARM had already been informed about my dismissal, so that business relations between ARM and codethink could be resumed. I rather doubt that that is standard procedure for dismissing people.

The time after this has been pretty meagre. If you expected at least one of linaro, canonical, mozilla or jolla to support lima, you'd be pretty wrong. Similarly, i have not seen any credible attempt to support lima driver development on a consulting basis. Then there is that fact that applying to companies which use mali and which have no interest in using an open source driver for mali is completely out, as that would mean working on the ARM binary driver, which in turn would mean that i would no longer be allowed to work on lima. All other companies have not gotten with the times with respect to hiring policies, this ranges from demanding that people relocate to interviewing solely about OpenGL spec corners for supposed driver development positions. There was one case in this whole time which had a proper first level interview, and where everything seemed to be working out, only to go dead without a proper explanation after a while. I have since learned from two independent sources that my hiring was stopped due to politics involving ARM. Lima has done wonders making me totally unhirable.

In May 2013 i wrote another proposal to ARM for an open source strategy for Mali (the first was done in Q2 2011 as part of codethink), hoping that in the meantime sentiments had shifted enough and that the absence of an in-between company would make the difference this time round. It got the unofficial backing of some people inside ARM, but they just ended up getting their heads chewed off, and I hope that they didn't suffer too much for trying to do the right thing. The speed with which this proposal was rejected by Jem Davies (ARM MPD VP of Technology), and the undertone of the email reply he sent me, suggests that his mind was made up beforehand and that he wasn't too happy about my actions this time either. This is quite amazing, as one would expect anyone at the VP level of a major company to at least keep his options open. Between the way in which this proposal was rejected and the political pressure ARM seems to apply against Lima, Mr Davies his claims that ARM simply has no customer demand and no budget, and nothing more, are very hard to believe.

So after the polite applause at the end of my FOSDEM2014 talks, it hit me. That applause was the only support i was going to ever get for my work, and i was just going to continue to hit walls everywhere, both with lima and in life. And it would take another year until the next applause, if at all. To be absolutely honest, given the way the modesetting and ATI stories worked out, i should be used to that by now, however hard it still is. But this time round i had been (and of course still am) living with my fantastic and amazingly understanding girlfriend for several years, and she ended up supporting me through Q4 2013 and most of 2014 when my unemployment benefit ran out. This time round, my trailblazing wasn't backfiring on me alone, i was dragging someone else down with me, someone who deserves much better.

I ended up being barely able to write code throughout most of 2014, and focused entirely on linux-sunxi. The best way to block things out was rewriting the linux-sunxi wiki, and that wiki is now lightyears ahead of everything else out there, and an example for everyone else (like linux-exynos and linux-rockchip) for years to come. The few code changes i did make were always tiny and trivial. In august Siarhei Siamashka convinced me that a small display driver for sunxi for u-boot would be a good quick project, and i hoped that it would wet my appetite for proper code again. Sadly though, the fact that sunxi doesn't hide all chip details like the Raspberry Pi does (this was the target device for simplefb) meant that simplefb had to be (generically) extended, and this proved once and for all that simplefb is a total misnomer and that people had been kidding themselves all along (i seem to always run into things like this). The resulting showdown had about 1.5 times the number of emails as simplefb has lines of C code. Needless to say, this did not help my mood either.

In June 2014, a very good friend of mine, who has been a Qt consultant for ages, convinced me to try consulting again. It took until October before something suited came along. This project was pretty hopeless from the start, but it reinstated my faith in my problem solving skills and how valuable those really could be, both for a project, and, for a change, for myself. While i wasn't able to afford traveling to Bordeaux for XDC2014, i was able to afford FOSDEM2015 and I am paying for my own upkeep for the first time in a few years. Finally replacing my incredibly worn and battered probook is still out of the question though.

When i was putting together the FOSDEM Graphics DevRoom schedule around christmas, I noticed that there was a inexcuseably low amount of talks, and my fingers became itchy again. The consulting job had boosted morale sufficiently to do some code again and showing off some initial renders on Mali T series seemed doable within the 5 or so weeks left. It gave the FOSDEM graphics devroom another nice scoop, and filled in one of the many many gaps in the schedule. This time round i didn't kid myself that it would change the world or help boost my profile or anything. This time round, i was just going to prove to myself that i have learned a lot since doing Lima, and that i can do these sort of things still, with ease, while having fun doing so and not exerting myself too much. And that's exactly what I did, nothing more and nothing less.

The vultures are circling.

When Egbert Eich and I were brainstorming about freeing ATI back in April and May of 2007, I stated that I feared that certain "community" members would not accept me doing something this big, and I named 3 individuals by name. While I was initially alone with that view, we did decide that, in light of the shitthrowing contest around XGL and compiz, doing everything right for RadeonHD was absolutely paramount. This is why we did everything in the open, at least from the moment AMD allowed us to do so. Open docs, irc channel, public ml, as short a turn-around on patches as possible given our quality standards and the fact that ATI was not helping at all. I did have a really tough uphill battle convincing everyone that doing native C code was the only way, as, even with all the technical arguments for native code, I knew that my own reputation was at stake. I knew that if I had accepted AtomBIOS for basic modesetting, after only just having convinced the world that BIOS free modesetting was the only way forward, I would've personally been nailed to the cross by those same people.

From the start we knew that ATI was not going to go down without a fight on this one, and I feared that those "community" members would do many things to try to damage this project, but we did not anticipate AMD losing political control over ATI internally, nor did we anticipate the lack of scruples of those individuals. AMD needed this open source driver to be good, as the world hated fglrx to the extent that the ATI reputation was dragging AMDs chips and server market down, and AMD was initially very happy with our work on RadeonHD. Yet to our amazement, some ATI employees sided with those same "community" members, and together they worked on a fork of RadeonHD, but this time doing everything the ATI way, like fglrx.

The amount of misinformation and sometimes even downright slander in this whole story was amazing. Those individuals were actively working on remarketing RadeonHD as an evil Microsoft/Novell conspiracy and while the internet previously was shouting "death to fglrx", it ended up shouting "death to radeonhd". Everyone only seemed to want to join into the shouting, and nobody took a step back to analyze what was really going on. This was ATI & red hat vs AMD & SuSE, and it had absolutely nothing to do with creating the best possible open source driver for ATI hardware.

During all of this I was being internally reminded that SuSE is the good guy and that it doesn't stoop down to the level of the shitthrowers. Being who I am, I didn't always manage to restrain myself, but even today I feel that if I had been more vocal, i could've warded off some of the most inane nonsense that was spewed. When Novell clipped SuSEs wings even further after FOSDEM2009, i was up for the chop too, and I was relieved as i finally could get some distance between myself and this clusterfuck (that once was a free software dream). I also knew that my reputation was in tatters, and that if i had been more at liberty to debunk things, the damage there could've been limited.

The vultures had won, and no-one had called them out for their insidious and shameless behaviour. In fact, they felt empowered, free to use whatever nasty and underhand tactics against anything that doesn't suit them for political or egotistical reasons. This is why the hacking of the RadeonHD repository occured, and why the reaction of the X.org community was aimed against the messenger and victim, and not against the evil-doers (I was pretty certain before I sent that emial that it was either one or the other, i hadn't forseen that it would be both though).

So when the idea of Lima formed in March 2011, I no longer just feared that these people would react. I knew for a fact that they would be there, ready to pounce on the first misstep made. This fed my frustration with how Lima was being treated by Codethink, as I knew that, in the end, I would be the one paying the price again. Similarly, if my code was not good enough, these individuals would think nothing of it to step in, rewrite history, and then trash me for not having done corner X or Y, as that was part of their modus operandi when they destroyed RadeonHD.

So these last few years, i have been caught in a catch-22 situation. ARM GPUs are becoming free, and I have clearly achieved what i set out to achieve. The fact that I so badly misjudged both ARM and the level of support i would get should not take away from the fundamental changes that came from my labour. Logically and pragmatically, it should be pretty hard to fault me in this whole story, but this is not how this game works. No matter what fundamental difference I could've made with lima, no matter how large and clear my contributions and sacrifices would be, these people would still try to distort history and try their utmost best to defame me. And now that moment has come.

But there are a few fundamental differences this time round: I am able to respond as i see fit as there is no corporate structure to shut me up, and I truly have nothing left to lose.

What started this?

To put it short:
12:59 #dri-devel: < Jasper> the alternative is fund an open-source replacement, but touching
                  lima is not an option
Jasper St. Pierre is very vocal on #dri-devel, constantly coming up with questions and asking for help or guidance. But he and his employer Endless Mobile seem to have their mind set on using the ARM Mali DDK, ARMs binary driver. He is working on integrating the Mali DDK with things like wayland, and he is very active in getting help on doing so. He has been doing so for several months now.

The above irc quote is not what i initially responded to though, the following is however:
13:01 #dri-devel: < Jasper> lima is tainted
13:01 #dri-devel: < Jasper> there's still code somewhere on a certain person's hard disk he
                  won't release because he's mad about things and the other project contributors
                  all quit
13:01 #dri-devel: < Jasper> you know that
13:02 #dri-devel: < Jasper> i'm not touching lima with a ten foot pole.
This is pure unadulterated slander.

Lima is not in any way tainted. It is freshly written MIT code and I was never exposed to original ARM code. This statement is an outrageous lie, and it is amazing that anyone in the open source world would lie like that and expect to get away with it.

Secondly, I am not "mad about things". I am severely de-motivated, which is quite different. But even then, apart from me not producing much code, there had been no real indication to Jasper St Pierre until that point that this were the case. Also, "all" the contributors to Lima did not quit. I am still there, albeit not able to produce much in the way of code atm. Connor is in college now and just spent the summer interning for Intel writing a new IR for mesa (and was involved with Khronos SPIR-V). Ben Brewer only worked on Lima on codethink's time, and that was stopped halfway 2012. Those were the only major contributors to Lima. The second line Jasper said there was another baseless lie.

Here is another highly questionable statement from Mr. St Pierre:
13:05 #dri-devel: < Jasper> robclark, i think it's too late tbh -- any money we throw at lima
                  would be tainted i think
What Jasper probably meant is that HE is the one who is tainted by having worked with the ARM Mali DDK directly. But his quick fire statements at 13:01 definitely could've fooled anyone into something entirely different. Anyone who reads that sees "Lima is tainted because libv is a total asshole". This is how things worked during RadeonHD as well. Throw some halftruths out together, and let the internet noise machine do the rest.

Here is an example of what happens then:
13:11 #dri-devel: < EdB> Jasper: what about the open source driver that try to support the next
                  generation (I don't remember the name). Is it also tainted ?
13:19 #dri-devel: < EdB> I never heard of that lima tainted things before and I was surprised
Notice that Jasper made absolutely no effort to properly clarify his statements afterwards. This is where i came in to at least try to defuse this baseless shitthrowing.

When Jasper was confronted, his statements became even more, let's call it, varied. Suddenly, the reason for not using lima is stated here:
13:53 #dri-devel: < Jasper> Lima wasn't good enough in a performance evaluation and since it
                  showed no activity for a few years, we decided to pay for a DDK in order to 
                  ship a product instead of working on Lima instead.
To me, that sounds like a pretty complete reason as to why Endless Mobile chose not to use Lima at the time. It does however not exclude trying to help further Lima at the same time, or later on, like now. But then the following statements came out of Jasper...
13:56 #dri-devel: < Jasper> libv, if we want to contribute to Lima, how can we do so?
13:57 #dri-devel: < Jasper> We'd obviously love an active open-source driver for Mali.
13:58 #dri-devel: < Jasper> I didn't see any way to contribute money through http://limadriver.org/
13:58 #dri-devel: < Jasper> Or have consulting done
Good to know... But then...
13:58 #dri-devel: < libv> Jasper: you or your company could've asked.
13:59 #dri-devel: < Jasper> My understanding was that we did.
13:59 #dri-devel: < Jasper> But that happened before I arrived.
13:59 #dri-devel: < libv> definitely not true.
13:59 #dri-devel: < Jasper> I was told we tried to reach out to the Lima project with no response.
                  I don't know how or what happened.
14:00 #dri-devel: < libv> i would remember such an extremely rare event
The supposed reaching out from EndlessM to lima seems pretty weird when taking the earlier statements into account, like at 13:53 and especially at 13:01. Why would anyone make those statements and then turn around and offer to help. What is Jasper hiding?

After that I couldn't keep my mouth shut and voiced my own frustration with how demotivated i am on doing the cleanup and release, and whined about some possible reasons as to why that is so. And then, suddenly...
14:52 #dri-devel: < Jasper> libv, so the reason we didn't contribute to Lima was because we didn't
                  imagine anything would come of it.
14:53 #dri-devel: < Jasper> It seems we were correct.
It seems that now Jasper his memory was refreshed, again, and that EndlessM never did reach out for another reason altogether...

A bit further down, Jasper lets the following slip:
14:57 #dri-devel: < Jasper> libv, I know about the radeonhd story. jrb has told me quite a lot.
14:58 #dri-devel: < Jasper> Jonathan Blandford. My boss at Endless, and my former boss at Red Hat.
And suddenly the whole thing becomes clear. In the private conversation that ensued Jasper explained that he is good friends with exactly the 3 persons named before, and that those 3 persons told him all he wanted to know about the RadeonHD project. It also seems that Jonathan Blandford was, in some way, involved with Red Hats decision to sign a pact with the ATI devil to produce a fork of a proper open source driver for AMD. These people simply never will go near anything I do, and would rather spend their time slandering yours truly than to actually contribute or provide support, and this is exactly what has been happening here too.

This brings us straight back to what was stated before anything else i quoted so far:
12:57 #dri-devel: < EdB> Jasper: I guess lima is to be avoided
12:58 #dri-devel: < Jasper> EdB, yeah, but that's not for legal reasons
Not for legal reasons indeed.

But the lies didn't end there. Next to coming up with more reasons to not contribute to lima, in that same private conversation Jasper "kindly" suggested that he'd continue Lima if only i throw my code over the wall. Didn't he state previously that Lima was tainted, with which i think he meant that he was tainted and couldn't work on lima?


At the time, i knew i was being played and bullshitted, but I couldn't make it stick on a single statement alone in the maelstrom that is IRC. It is only after rereading the irc log and putting Jaspers statements next to eachother that the real story becomes clear. You might feel that i am reading too much into this, and that i have deliberately taken Jaspers statements out of context to make him look bad. But I implore you to verify the above with the irc log, and you will see that i have left very little necessary context out. The above is Jaspers statements, with my commentary, put closely together, removing the noise (which is a really bad description for the rest of the irc discussion that went on at that time) and boosting the signal. Jasper statements are highly erratic, they make no sense when they are considered collectively as truths, and some are plain lies when considered individually.

Jasper has been spending a lot of time asking for help for supporting a binary driver. Nobody in the business does that. I guess that (former) red hat people are special. I still have not understood why it is that Red Hat is able to get away with so much while companies like canonical and SuSE get trashed so easily. But that is besides the point here; no-one questioned why Jasper is wasting so much time of open source developers on a binary driver. Jasper was indirectly asked why he was not using Lima, nothing more.

If Jasper had just stated that Endless Mobile decided to not use Lima because Endless Mobile thought Lima was not far enough along yet, and that Endless Mobile did not have the resources to do or fund all the necessary work still, then i think that that would've been an acceptable answer for EdB. It's not something i like hearing, but i am quite used to this by now. Crucially though, i would not have been in a position to complain.

Jasper instead went straight to trashing me and my work, with statements that can only be described as slander. When questioned, he then gave a series of mutually exclusive statements, which can only be explained as "making things up as he goes along", working very hard to mask the original reasons for not supporting Lima at all.

This is more than just "Lima is not ready", otherwise Jasper would have been comfortable sticking to that story. Jasper, his boss and Endless Mobile did not decide against Lima on any technical or logistical basis. This whole story never was about code at all. These people simply would do anything but support something that involves me, no matter how groundbreaking, necessary or worthy my work is. They would much rather trash and re-invent, and then trash some more...

It of course did not take long for the other vultures to join the frenzy... When the discussion on irc was long over, Daniel Stone made one attempt to revive it, to no avail. Then, a day later, someone anonymously informed phoronix, and Dave Airlie could then clearly be seen to try to stoke the fire, again, with limited success. I guess the fact that most of the phoronix users still only care about desktop hardware played to my advantage here (for once). It does very clearly show how these people work though, and I know that they will continue to try to play this card, trying hard to trigger another misstep from me or a cause a proper shitstorm on the web.

So what now.

Now I have a choice. I can wait until i am in the mood to clean up this code and produce something useful for public consumption, and in the meantime see my name and my work slandered. Or... If i throw my (nasty) code over the wall, someone will rename/rewrite it, mess up a lot of things, and trash me for not having done corner X or Y, so essentially slander me and my work and probably even erase my achievements from history, by using more of my work against me... And in the latter case i give people like Jasper and companies like Endless Mobile what they want, in spite of their contemptible actions and statements. Logically, there is only one conclusion possible in that constellation.

At the end of the day, i am still the author of this code and it was my time that I burned on it, my life that i wasted on it. I have no obligations to anyone, and am free to do with my code what i want. I cannot be forced into anything. I will therefor release my work, under my own terms, when i want to and when i feel ok doing so.

This whole incident did not exactly motivate me to spend time on cleaning code up and releasing it. At best, it confirmed my views on open source software and certain parts of the X.org community.

This game never was about code or about doing The Right Thing, and never will be.
link18 comments|post comment

PowerVR SGX code leaked. [Nov. 21st, 2014|11:22 pm]
Luc Verhaegen
[Tags|, , , , , , , , , ]
[Current Location |Rumpelkammer again, finally.]
[mood |Appalled at people, but hopeful.]
[music |Foo Fighters - The colour and the shape - Monkey wrench]

So someone leaked 2011 era PowerVR SGX microcode and user space... And now everyone is pissing themselves like a bunch of overexcited puppies...

I've been fed links from several sides now, and i cannot believe how short-sighted and irresponsible people are, including a few people who should know better.


Having gotten that out of the way, I am writing this blog to put everyone straight and stop the nonsense, and to calmly explain why this leak is not a good thing.

Before i go any further, IANAL, but i clearly do seem to tread much more carefully on these issues than most. As always, feel free to debunk what i write here in the comments, especially you actual lawyers, especially those lawyers in the .EU.

LIBV and the PVR.

Let me just, once again, state my position towards the PowerVR.

I have worked on the Nokia N9, primarily on the SGX kernel side (which is of course GPLed), but i also touched both the microcode and userspace. So I have seen the code, worked with and i am very much burned on it. Unless IMG itself gives me permission to do so, i am not allowed to contribute to any open source driver for the PowerVR. I personally also include the RGX, and not just SGX, in that list, as i believe that some things do remain the same. The same is true for Rob Clark, who worked with PowerVR when at Texas Instruments.

This is, however, not why i try to keep people from REing the PowerVR.

The reason why i tell people to stay away is because of the design of the PowerVR and its driver stack: PVR is heavily microcode driven, and this microcode is loaded through the kernel from userspace. The microcode communicates directly with the kernel through some shared structs, which change depending on build options. There are sometimes extensive changes to both the microcode, kernel and userspace code depending on the revision of the SGX, customer project and build options, and sometimes the whole stack is affected, from microcode to userspace. This makes the powervr a very unstable platform: change one component, and the whole house of cards comes tumbling down. A nightmare for system integrators, but also bad news for people looking to provide a free driver for this platform. As if the murderous release cycle of mobile hardware wasn't bad enough of a moving target already.

The logic behind me attempting to keep people away from REing the PowerVR is, at one end, the attempt to focus the available decent developers on more rewarding GPUs and to keep people from burning out on something as shaky as the PowerVR. On the other hand, by getting everyone working on the other GPUs, we are slowly forcing the whole market open, singling out Imagination Technologies. At one point, IMG will be forced to either do this work itself, and/or to directly support open sourcing themselves, or to remain the black sheep forever.

None of the above means that I am against an open source driver for PVR, quite the opposite, I just find it more productive to work on the other GPUs amd wait this one out.

Given their bad reputation with system integrators, their shaky driver/microcode design, and the fact that they are in a cut throat competition with ARM, Imagination Technologies actually has the most to gain from an open source driver. It would at least take some of the pain out of that shaky microcode/kernel/userspace combination, and make a lot of peoples lives a lot easier.

This is not open source software.

Just because someone leaked this code, it has not magically become free software.

It is still just as proprietary as before. You cannot use this code in any open source project, or at all, the license on it applies just as strongly as before. If you download it, or distribute it, or whatever other actions forbidden in the license, you are just as accountable as the other parties in the chain.

So for all you kiddies who now think "Great, finally an open driver for PowerVR, let's go hack our way into celebrity", you couldn't be more wrong. At best, you just tainted yourself.

But the repercussion go further than that. The simple fact that this code has been leaked has cast a very dark shadow on any future open source project that might involve the powervr. So be glad that we have been pretty good at dissuading people from wasting their time on powervr, and that this leak didn't end up spoiling many man-years of work.

Why? Well, let's say that there was an advanced and active PowerVR reverse engineering project. Naturally, the contributors would not be able to look at the leaked code. But it goes further than that. Say that you are the project maintainer of such a reverse engineered driver, how do you deal with patches that come in from now on? Are you sure that they are not taken more or less directly from the leaked driver? How do you prove this?

Your fun project just turned from a relatively straightforward REing project to a project where patches absolutely need to be signed-off, and where you need to establish some severe trust into your contributors. That's going to slow you down massively.

But even if you can manage to keep your code clean, the stigma will remain. Even if lawyers do not get involved, you will spend a lot of time preparing yourself for such an eventuality. Not a fun position to be in.

The manpower issue.

I know that any clued and motivated individual can achieve anything. I also know that really clued people, who are dedicated and can work structuredly are extremely rare and that their time is unbelievably valuable.

With the exception of Rob, who is allowed to spend some of his redhat time on the freedreno driver, none of the people working on the open ARM GPU drivers have any support. Working on such a long haul project without support either limits the amount of time available for it, or severely reduces the living standard of the person doing so, or anywhere between those extremes. If you then factor in that there are only a handful of people working on a handful of drivers, you get individuals spending several man-years mostly on their own for themselves.

If you are wondering why ARM GPU drivers are not moving faster, then this is why. There are just a limited few clued individuals who are doing this, and they are on their own, and they have been at it for years by now. Think of that the next time you want to ask "Is it done yet?".

This is why I tried to keep people from REing the powerVR, what little talent and stamina there is can be better put to use on more straightforward GPUs. We have a hard enough time as it is already.

Less work? More work!

If you think that this leaked driver takes away much of the hard work of reverse engineering and makes writing an open source driver easy, you couldn't be more wrong.

This leak means that here is no other option left apart from doing a full clean room. And there need to be very visible and fully transparent processes in place in a case like this. Your one man memory dumper/bit-poker/driver writer just became at least two persons. One of them gets to spend his time ogling bad code (which proprietary code usually ends up being), trying to make sense of it, and then trying to write extensive documentation about it (without being able to test his findings much). The other gets to write code from that documentation, but also little more. Both sides are very much forbidden to go back and forth between those two positions.

As if we ARM GPU driver developers didn't have enough frustration to deal with, and the PVR stack isn't bad enough already, the whole situation just got much much worse.

So for all those who think that now the floodgates are open for PowerVR, don't hold your breath. And to those who now suddenly want to create an open source driver for the powervr, i ask: you and what army?

For all those who are rinsing out their shoes ask yourself how many unsupported man-years you will honestly be able to dedicate to this, and whether there will be enough individuals who can honestly claim the same. Then pick your boring task, and then stick to it. Forever. And hope that the others also stick to their side of this bargain.

LOL, http://goo.gl/kbBEPX

What have we come to?

The leaked source code of a proprietary graphics driver is not something you should be spreading amongst your friends for "Lolz", especially not amongst your open source graphics driver developing friends.

I personally am not too bothered about the actual content of this one, the link names were clear about what it was, and I had seen it before. I was burned before, so i quickly delved in to verify that this was indeed SGX userspace. In some cases, with the links being posted publicly, i then quickly moved on to dissuade people from looking at it, for what limited success that could have had.

But what would i have done if this were Mali code, and the content was not clear from the link name? I got lucky here.

I am horrified about the lack of responsibility of a lot of people. These are not some cat pictures, or some nude celebrities. This is code that forbids people from writing graphics drivers.

But even if you haven't looked at this code yet, most of the damage has been done. A reverse engineered driver for powervr SGX will now probably never happen. Heck, i just got told that someone even went and posted the links to the powerVR REing mailinglist (which luckily has never seen much traffic). I wonder how that went:
Are you the guys doing the open source driver for PowerVR SGX?
I have some proprietary code here that could help you speed things along.
Good luck!

So for the person who put this up on github: thank you so much. I hope that you at least didn't use your real name. I cannot imagine that any employer would want to hire anyone who acts this irresponsibly. Your inability to read licenses means that you cannot be trusted with either proprietary code or open source code, as you seem unable to distinguish between them. Well done.

The real culprit is of course LG, for crazily sticking the GPL on this. But because one party "accidentally" sticks a GPL on that doesn't make it GPL, and that doesn't suddenly give you the right to repeat the mistake.

Last months ISA release.

And now for something slightly different...

Just over a month ago, there was the announcement about Imagination Technologies' new SDK. Supposedly, at least according to the phoronix article, Imagination Technologies made the ISA (instruction set architecture) of the RGX available in it.

This was not true.

What was released was the assembly language for the PowerVR shaders, which then needs to be assembled by the IMG RGX assembler to provide the actual shader binaries. This is definitely not the ISA, and I do not know whether it was Alexandru Voica (an Imagination marketing guy who suddenly became active on the phoronix forums, and who i believe to be the originator of this story) or the author of the article on Phoronix who made this error. I do not think that this was bad intent though, just that something got lost in translation.

The release of the assembly language is very nice though. It makes it relatively straightforward to match the assembly to the machine code, and takes away most of the pain of ISA REing.

Despite the botched message, this was a big step forwards for ARM GPU makers; Imagination delivered what its customers need (in this case, the ability to manually tune some shaders), and in the process it also made it easier for potential REers to create an open source driver.

Looking forward.

Between the leak, the assembly release, and the market position Imagination Technologies is in, things are looking up though.

Whereas the leak made a credible open source reverse engineering project horribly impractical and very unlikely, it did remove some of the incentive for IMG to not support an open source project themselves. I doubt that IMG will now try to bullshit us with the inane patent excuse. The (not too credible) potential damage has been done here already now.

With the assembly language release, a lot of the inner workings and the optimization of the RGX shaders was also made public. So there too the barrier has disappeared.

Given the structure of the IMG graphics driver stack, system integrators have a limited level of satisfaction with IMG. I really doubt that this has improved too much since my Nokia days. Going open source now, by actively supporting some clued open source developers and by providing extensive NDA-free documentation, should not pose much of a legal or political challenge anymore, and could massively improve the perception of Imagination Technologies, and their hardware.

So go for it, IMG. No-one else is going to do this for you, and you can only gain from it!
link9 comments|post comment

ARM Mali Midgard, as featured on Anandtech. [Jul. 16th, 2014|12:40 pm]
Luc Verhaegen
[Tags|, , , , , , , , , , , , , , ]
[Current Location |Rumpelkammer]
[music |Arms and Sleepers - Antwerp]

Anandtech recently went all out on the ARM midgard architecture (Mali T series). This was quite astounding, as ARM MPD tends to be a pretty closed shop. The Anandtech coverage included an in-depth view of the Mali Midgard GPU, a (short) Q&A session with Jem Davies (the head honcho of ARM MPD, ARMs Media Processing Division, the part of ARM that develops the Mali and display and video engines) and a google hangout with Jem Davies a week later.

This set of articles does not seem like the sort of thing that ARM MPD would have initiated itself. Since both Imagination Technologies and NVidia did something similar months earlier, my feeling is that this was either initiated by Anand Lal Shimpi himself, or that this was requested by ARM marketing in response to the other articles.

Several interesting observations can be made from this though, especially from the answers (or sometimes, lack thereof) to the Q&A and google hangout sessions.

Hiding behind Linaro.

First off, Mr Davies still does not see an open source driver as a worthwhile endeavour for ARM MPD, and this is a position that hasn't changed since i started the lima driver, when my former employer went and talked to ARM management. Rumour has it that most of ARMs engineers both in MPD and other departments would like this to be different, and that Mr Davies is mostly alone with his views, but that's currently just hearsay. He himself states that there are only business reasons against an open source driver for the mali.

To give some weight to this, Mr Davies stated that he contributed to the linux kernel, and i called him out on that one, as i couldn't find any mention of him in a kernel git tree. It seems however that his contributions are from during the Bitkeeper days, and that the author trail on those changes probably got lost. But, having contributed to a project at one point or another is, to me, not proof that one actively supports the idea of open source software, at best it proves that adding support to the kernel for a given ARM device or subsystem was simply necessary at one point.

Mr Davies also talked about how ARM is investing a lot in linaro, as a proof of ARMs support of open source software. Linaro is a consortium to further linux on ARM, so per definition ARM plays a very big role in it. But it is not ARM MPD that drives linaro, it is ARM itself. So this is not proof of ARM MPD actively supporting open source software. Mr Davies did not claim differently, but this distinction should be made very clear in this context.

Then, linaro can be described as an industry consortium. For non-founding members of a consortium, such a construction is often used to park some less useful people while gaining the priviledge to claim involvement as and when desired. The difference to other consortiums is that most of the members come from a deeply embedded background, where the word "open" never was spoken before, and, magically, simply by having joined linaro, those deeply embedded companies now feel like they succesfully ticked the "open source" box on their marketing checklist. Several of linaros members are still having severe difficulty conforming to the GPL, but they still proudly wear the linaro badge as proof of their open source...ness?

As a prominent member of the sunxi community, I am most familiar with Allwinner, a small chinese cheap SoC designer. At the start of the year, we were seeing some solid signs of Allwinner opening up to our community directly. In March however, Allwinner joined linaro and people were hopeful that this meant that a new era of openness had started for Allwinner. As usual, I was the only cynical voice and i warned that this could mean that Allwinner now wouldn't see the need to further engage with us. Ever since, we haven't been able to reach our contacts inside Allwinner anymore, and even our requests for compliance with the GPL get ignored.

Linaro membership does not absolve from limited open source involvement or downright license violation, but for many members, this is exactly how it is used. Linaro seems to be a get-out-of-jail-free card for several of its members. Linaro membership does not need to prove anything, linaro membership even seems to have the opposite effect in several cases.

ARM driving linaro is simply no proof that ARM MPD supports open source software.

The patent excuse.

I am amazed that people still attempt to use this as an argument against open source graphics drivers.

Usually this is combined with the claim that open source drivers are exposing too much of the inner workings of the hardware. But this logic in itself states that the hardware is the problem, not the software. The hardware itself might or might not have patent issues, and it is just a matter of time before the owner of said breached patents will come a-knocking. At best, an open source driver might speed up the discovery of said issues, but the driver itself never is the cause, as the problems will have been there all along.

One would actually think that the Anandtech article about the midgard architecture would reveal more about the hardware, and trigger more litigation, than the lima driver could ever do, especially given how neatly packaged an in depth anandtech article is. Yet ARM MPD seemed to have had no issue with exposing this much information in their marketing blitz.

I also do not believe that patents are such a big issue. If graphics hardware patents were such big business, you would expect that an industry expert in graphics, especially one who is a dab hand at reverse engineering, would be contacted all the time to help expose potential patent issues. Yet i never have been contacted, and i know of no-one who ever has been.

Similarly. the first bits of lima code were made available 2.5 years ago, with bits trickling out slowly (much to my regret), and there are still several unknowns today. If lima played any role in patent disputes, you would again expect that i would be asked to support those looking to assert their patents. Again, nothing.

GPU Patents are just an excuse, nothing more.

When I was at SuSE, we freed ATI for AMD, and we never did hear that excuse. AMD wanted a solid open source strategy for ATI as ATI was not playing ball after the merger, and the bad publicity was hurting server (CPU) sales. Once the decision was made to go the open source route, patents suddenly were not an issue anymore. We did however have to deal with IP issues (or actually, AMD did - we made very sure we didn't get anything that wasn't supposed to be free), such as HDCP and media decoding, which ATI was not at liberty to make public. Given the very heated war that ATI and Nvidia fought at the time, and the huge amount of revenue in this market, you would think that ATI would be a very likely candidate for patent litigation, yet this never stood in the way of an open source driver.

There is another reason as to why patents are that popular an excuse. The words "troll" and "legal wrangling" are often sprinkled around as well so that images of shady deals being made by lawyers in smokey backrooms usually come to mind. Yet we never get to hear the details of patent cases, as even Mr Davies himself states that ARM is not making details available of ongoing cases. I also do not know of any public details on cases that have been closed already (not that i have actively looked - feel free to enlighten me). Patents are a perfect blanket excuse where proof apparently does not seem to be required.

We open source developers are very much aware of the damage that software patents do, and this makes the patent weapon perfect for deployment against those who support open source software. But there is a difference between software patents and the patent cases that ARM potentially has to deal with on the Mali. Yet we seem to have made patents our own kryptonite, and are way too easily lulled into backing off at the first mention of the word patent.

Patents are a poor excuse, as there is no direct relationship between an open source driver and the patent litigation around the hardware.

The Resources discussion.

As a hardware vendor (or IP provider) doing a free software driver never is for free. A lot of developer time does need to be invested, and this is an ongoing commitment. So yes, a viable open source driver for the Mali will consume some amount of resources.

Mr Davies states that MPD would have to incur this cost on its own, as MPD seems to be a completely separate unit and that further investment can only come from profit made within this group. In light of that information, I must apologize for ever having treated ARM and ARM MPD as one and the same with respect to this topic. I will from now on make it very clear that it is ARM MPD, and ARM MPD alone, that doesn't want an open source mali driver.

I do believe that Mr Davies his cost versus gain calculations are too direct and do not allow for secondary effects.

I also believe that an ongoing refusal to support an open source strategy for Mali will reflect badly on the sale of ARM processors and other IP, especially with ARM now pushing into the server market and getting into intel territory. The actions of ARM MPD do affect ARM itself, and vice versa. Admittedly, not as much as some with those that more closely tie the in-house GPU with the rest of the system, but that's far from an absolute lack of shared dependency and responsibility.

The Mali binary problem.

One person in the Q&A section asked why ARM isn't doing redistributable drivers like Nvidia does for the Tegra. Mr Davies answered that this was a good idea, and that linaro was doing something along those lines.

Today, ironically, I am the canonical source for mali-400 binaries. At the sunxi project, we got some binaries from the Cubietech people, built from code they received from allwinner, and the legal terms they were under did not prevent them from releasing the built binaries to the public. Around them (or at least, using the binaries as a separate git module) I built a small make based installation system which integrates with ARMs open source memory manager (UMP) and even included a quick GLES test from the lima tests. I stopped just short of debian packaging. The sunxi-mali repository, and the wiki tutorial that goes with it, now is used by many other projects (like for instance linux-rockchip) as their canonical source for (halfway usable) GPU support.

There are several severe problems with these binaries, which we have either fixed directly, have been working around or just have to live with. Direct fixes include adding missing library dependencies, and hollowing out a destructor function which made X complain. These are binary hacks. The xf86-video-fbturbo driver from Siarhei Siamashka works around the broken DRI2 buffer management, but it has to try to autodetect how to work around the issues, as it is differently broken on the different versions of the X11 binaries we have. Then there is the flaky coverage, as we only have binaries for a handful of kernel APIs, making it impossible to match them against all vendor provided SoC/device kernels. We also only have binaries for fbdev or X11, and sometimes for android, mostly for armhf, but not always... It's just one big mess, only slightly better than having nothing at all.

Much to our surprise, in oktober of last year, ARM MPD published a howto entry about setting up a working driver for mali midgard on the chromebook. It was a step in the right direction, but involved quite a bit off faff, and Connor Abbott (the brilliant teenager REing the mali shaders) had to go and pour things into a proper git repository so that it would be more immediately useful. Another bout of insane irony, as this laudable step in the right direction by ARM MPD ultimately left something to be desired.

ARM MPD is not like ATI, Nvidia, or even intel, qualcomm or broadcom. The Mali is built into many very different SoC families, and needs to be integrated with different display engines, 2D engines, media engines and memory/cache subsystems.

Even the distribution of drivers is different. From what i understand, mali drivers are handled as follows. The Mali licensees get the relevant and/or latest mali driver source code and access to some support from ARM MPD. The device makers, however, only rarely get their hands on source code themselves and usually have to make do with the binaries provided by the SoC vendor. Similarly, the device maker only rarely gets to deal with ARM MPD directly, and usually needs to deal with some proxy at the SoC vendor. This setup puts the responsibility of SoC integration squarely at the SoC vendor, and is well suited for the current mobile market: one image per device at release, and then almost no updates. But that market is changing with the likes of Cyanogenmod, and other markets are opening or are actively being opened by ARM, and those require a completely different mode of operation.

There is gap in Mali driver support that ARM MPDs model of driver delivery does not cater for today, and ARM MPD knows about this. But MPD is going to be fighting an upbill battle to try to correct this properly.

Binary solutions?

So how can ARM MPD try to tackle this problem?

Would ARM MPD keep the burden of making suitable binaries available solely with SoC vendors or device makers? Not likely as that is a pretty shakey affair that's actively hurting the mali ecosystem. SoCs for the mobile market have incredibly short lives, and SoC and device software support is so fragmented that these vendors would be responsible for backporting bugfixes to a very wide array of kernels and SoC versions. On top of that, those vendors would only support a limited subset of windowing systems, possibly even only android as this is their primary market. Then, they would have to set up the support infrastructure to appropriately deal with user queries and bug reports. Only very few vendors will end up even attempting to do this, and none are doing so today. In the end, any improvement at this end will bring no advantages to the mali brand or ARM MPD. If this path is kept, we will not move on from the abysmal situation we are in today, and the Mali will remain to be seen as a very fragmented product.

ARM MPD has little other option but to try to tackle this itself, directly, and it should do so more proactively than by hiding behind linaro. Unfortunately, to make any real headway here, this means providing binaries for every kernel driver interface, and the SoC vendor changes to those interfaces, on top of other bits of SoC specific integration. But this also means dealing with user support directly, and these users will of course spend half their time asking questions which should be aimed at the SoC vendor. How is ARM MPD going to convince SoC vendors to participate here? Or is ARM MPD going to maintain most of the SoC integration work themselves? Surely it will not keep the burden only at linaro, wasting the resources of the rest of ARM and of linaro partners?

ARM MPD just is in a totally different position than the ATIs and Nvidias of this world. Providing binaries that will satisfy a sufficient part of the need is going to be a huge drain on resources. Sure, MPD is not spending the same amount of resources on optimizing for specific setups and specific games like ATI or Nvidia are doing, but they will instead have to spend it on the different SoCs and devices out there. And that's before we start talking about different windowing infrastructure, beyond surfaceflinger, fbdev or X11. Think wayland, mir, even directFB, or any other special requirements that people tend to have for their embedded hardware.

At best, ARM MPD itself will manage to support surfaceflinger, fbdev and X11 on just a handful of popular devices. But how will ARM MPD know beforehand which devices are going to be popular? How will ARM MPD keep on making sure that the binaries match the available vendor or device kernel trees? Would MPD take the insane route of maintaining their own kernel repositories with a suitable mali kernel driver for those few chosen devices, and backporting changes from the real vendor trees instead? No way.

Attempting to solve this very MPD specific problem with only binaries, to any degree of success, is going to be a huge drain on MPD resources, and in the end, people will still not be satisfied. The problem will remain.

The only fitting solution is an open source driver. Of course, the Samsungs of this world will not ship their flagship phones with just an open source GPU driver in the next few years. But an open source driver will fundamentally solve the issues people currently have with Mali, the issues which fuel both the demand for fitting distributable binaries and for an open source driver. Only an open source driver can be flexible and cost-effective enough to fill that gap. Only an open source driver can get silicon vendors, device makers, solution creators and users chipping in, satisfying their own, very varied, needs.

Change is coming.

The ARM world is rapidly changing. Hardware review sites, which used to only review PC hardware, are more and more taking notice of what is happening in the mobile space. Companies that are still mostly stuck in embedded thinking are having to more and more act like PC hardware makers. The lack of sufficiently broad driver support is becoming a real issue, and one that cannot be solved easily or cheaply with a quick binary fix, especially for those who sell no silicon of their own.

The Mali marketing show on Anandtech tells us that things are looking up. The market is forcing ARM MPD to be more open, and MPD has to either sink or swim. The next step was demonstrated by yours truly and some other very enterprising individuals, and now both Nvidia and Broadcom are going all the way. It is just a matter of time before ARM MPD has to follow, as they need this more than their more progressive competitors.

To finish off, at the end of the Q&A session, someone asked: "Would free drivers gives greater value to the shareholders of ARM?". After a quick braindump, i concluded "Does ARMs lack of free drivers hurt shareholder value?" But we really should be stating "To what extent does ARMs lack of free drivers hurt shareholder value?".
link5 comments|post comment

Graphics DevRoom at FOSDEM2014. [Oct. 12th, 2013|11:35 pm]
Luc Verhaegen
[Tags|, , , , , , , , , , , , , , , ]
[Current Location |Rumpelkammer]
[mood |excitedexcited]
[music |Chris Clark - Vengeance drools]

Yes, there is going to be another exciting DevRoom about graphics on the upcoming FOSDEM.

It's not called the X.org DevRoom this time round, but a hopefully more general Graphics DevRoom. As was the case with the X.org DevRooms before, anything related to graphics drivers and windowing systems goes. While the new name should make it clearer that this DevRoom is about more than just X, it also doesn't fully cover the load either, as this explicitly does include input drivers as well.

Some people have already started wondering why I haven't been whining at them before. Well, my trusted system of blackmailing people into holding talks early on failed this year. The FOSDEM deadline was too early and XDC was too late, so I decided to take a chance, and request a devroom again, in the hope that enough people will make it over to the fantastic madness that is FOSDEM.

After endless begging and grovelling the FOSDEM organizers got so fed up that they gave us two full days again. This means that we will be able to better group things, and avoid a scheduling clash like with the ARM talks last year (where ARM system guys were talking in one room exactly when ARM graphics guys were talking in another). All of this doesn't mean that First Come, First Serve doesn't apply, and if you do not want to hold a talk with a hangover in an empty DevRoom, you better move quickly :)

The FOSDEM organizers have a system called pentabarf. This is where everything is tracked and the schedules are created, and, almost magically, at the other end, all sorts of interesting things fall out, like the unbelievably busy but clear website that you see every year. This year though, it is expected that speakers themselves manage their own details, and that the DevRoom organizers oversee this, so we will no longer use the trusted wiki pages we used before. While i am not 100% certain yet, i think it is best that people who have spoken at the DevRoom (most of whom i will be poking personally anyway) in the past few years first talk to me first before working with pentabarf, as otherwise there will be duplicate accounts which will mean more overhead for everyone. More on that in the actual call for speakers email which will hit the relevant mailing lists soon.

FOSDEM futures for ARM

Connor Abbott and I both have had chromebooks for a long long time. Connor bought his when it first came out, which was even before the last FOSDEM. I bought mine at a time where I thought that Samsung was never going to sell it in germany, and the .uk version arrived on my doorstep 3 days before the announcement for Europe went out. These things have been burning great big holes in our souls ever since, as i stated that we would first get the older Mali models supported properly with our Lima driver, and deliver a solid graphics driver before we lose ourselves again in the next big thing. So while both of us had this hardware for quite a while, we really couldn't touch these nice toys with an interesting GPU at all.

Now, naturally, this sort of thing is a bit tough to impose on teenagers, as they are hormonally programmed to break rules. So when Connor got bored during the summer (as teenagers do), he of course went and broke the rules. He did the unspeakable, and grabbed ARMs standalone shader compiler and started REing the Mali Midgard ISA. When his father is at FOSDEM this year, the two of us will have a bit of 'A Talk' about Connors wild behaviour, and Connor will be punished. Probably by forcing him to finish the beers he ordered :)

Luckily, adults are much better at obeying the rules. Much, much better.

Adults, for instance, would never go off and write a command stream tracer for this out of bounds future RE project. They would never ever dare to replay captured command streams on the chromebook. And they definitely would not spend days sifting through a binary to expose the shader compiler of the Mali Midgard. Such a thing would show weakness in character and would just undermine authority, and I would never stoop so low.

If I had done such an awful thing, then I would definitely not be talking about how much harder capture and replay were, err, would be, on this Mali, and that the lessons learned on the Mali Utgard will be really useful... In future? I would also not be mentioning how nice it would be to work on a proper linux from the get-go. I would also never be boasting at how much faster Connor and I will be at on turning our RE work on T6xx into a useful driver.

It looks like Connor and I will have some very interesting things to own up to at FOSDEM :)
link10 comments|post comment

Intel & Mir: The point-of-view of a graphics driver developing bystander. [Sep. 10th, 2013|01:22 am]
Luc Verhaegen
[Tags|, , , , , , , , ]
[Current Location |Rumpelkammer]
[mood |annoyedannoyed]
[music |Amon Tobin - Sordid]

Only a few days ago did I write about how open source software is not about "code or design or doing The Right Thing". "Open source software is about power, politics, corporate affiliation, and loads and loads of noise." I would like to thank Intel for so succinctly underlining that now with their current action.

Before I go any further, this seems to not be Chris Wilsons decision or his preferred solution. Chris wrote the patch which he was told, by unnamed party or parties in Intel, to back out. Also, I personally do not condone the actions taken by Canonical, but, as a graphics driver developer, I find Intels actions far worse. I rather doubt that Intel thought this one through properly.

What's the problem?

As a graphics driver developer, I fail to see the big problem with Mir.

So what if Canonical has decided to reinvent Wayland? Apart from the weird contribution agreement (which will only limit contributions), Mir is fully free software isn't it? Who are they hurting apart from their own resources and their own users? It's not that I am applauding Canonical for their decision, but I really don't see the massive problem here.

Why is Canonical not allowed to do this?

Reinvention galore

I personally really hate things being reinvented all the time. It is the disease that plagues open source software, and that what makes sure that we don't have a growing linux market.

How often have we heard that something is outdated and broken and doesn't fit modern demands anymore? We are then invariably being told that something new is being built afresh, from the lessons learned of what was done "wrong" before, and that in a few months time, everything is going to be fantastic. Sadly, such timeframes never pan out, and while the known "errors" are fixed, everything else gets broken, which then has to be reinvented or ported as well (or which simply remains broken). And then several years down the line, things are still not perfect, and then someone else (or sometimes even the same person) goes off and implements the next great thing from scratch, again.

We never have something that just works, we just go from broken state to broken state. And nobody learns from this, nobody apparently ever states "Hang on, isn't that pretty much the same story we heard 3 years ago?"

To me, as a stupid shortsighted driver developer, Wayland seems like X reinvented. A server/client display architecture with the new lessons learned implemented, but with everything else broken. We've been waiting for getting all those little niggles worked out ever since 2009, and at one point, networking was added to Wayland making it even more of an X replacement.

So then Mir was announced... And suddenly the world was ablaze. Huge flamewars broke out everywhere and effigies of Mark Shuttleworth were getting burned in the forums. I found the Mir move quite ironic, at first, and thought that the outrage was quite out of proportion, but then I read this article. It is a who's who of reinventers, complaining about Canonical reinventing Wayland. I was appalled.

What exactly gives these people the sole monopoly on reinvention?

What is Intel afraid of?

How could Mir possibly threaten Wayland?

Intel is a pretty big company, and it probably has the largest contingent of open source developers devoted to graphics. It employs some of the brightest and most influential people in the business. On top of that, Wayland was there first, has had more time to mature, has had more applications and toolkits ported, and has a much larger mindshare. Most people would think that Waylands future is pretty secure.

So what could possibly be so much better about Mir that makes Mir such a big threat to Wayland that Intels graphics driver developers have to be told not to support XMir at all? Honestly, in the above constellation, how vastly more superior technically does Mir have to be to justify such an action? If Intel really feels that it has to react like this, well, then it might as well just throw in the towel and go Mir immediately, as Wayland clearly must be completely useless.

What a way to oust your own insecurity.

Software Fascism

Intel finds it necessary to play games with their X.org graphics driver, instead of having Wayland battle it out directly with Mir.

This kind of powerplay is quite insidious, and far more damaging than most people would expect. It completely skews the ability of software to compete on a fair and equal grounds, and hurts us all as it is mostly applied by those who are not able to compete properly, or those who feel as if they shouldn't need to bother to compete properly. It tends to favour the least technically advanced and the least morally acceptable.

The best example which I have come across so far is the RadeonHD versus Radeon battle. RadeonHD beat ATI by actually providing a solid open source driver in September 2007, and we at SuSE had a stated goal of being able to ship a solid open source driver on enterprise desktop rollouts. 3 months later, Radeon came around with support for the same hardware. It was technically inferior, and "borrowed" much of the hard work of radeonHD plus some noise added on top. What was worse was how the so-called X.org community used software fascism to artificially boost the Radeon driver. This started out with the refusal of a mailing list at the usual place, hit a low point with RadeonHD being dropped from the build script for the xserver, and sank to whole new levels when, 2 years after the obvious death of the RadeonHD driver, the RadeonHD repository got vandalized (and the whistleblower got tarred and feathered while the perpetrators were commended for their "quick" confession).

So who won?

Well, it definitely was not RadeonHD, as that died early 2009 with Novell laying off a large portion of SuSE developers in Nuremberg. As luck had it, at the same time, AMD experienced serious financial difficulties and did not continue the RadeonHD project with SuSE. But although Radeon did survive, it did not win either. ATI won, AMD (which wanted a proper open source driver, whereas ATI seriously didn't) lost, and we all lost with it. Fglrx still rules supreme today, but now it does not get as much flack as it did before, as the figleaf driver provides some sort of an alternative for those who are unhappy with fglrx. But it goes beyond that, the radeon driver consistently applies or applied the solutions ATI fglrx developers recommended, instead of the empirical solutions we at RadeonHD usually chose, and the radeon driver is not as good as it could be.

Software fascism goes further than just badly skewing competition, and it always is a negative influence on software. Who knows what other bad decisions will make their way into the Intel driver now?

The responsibility of a graphics driver

The main responsibility of a graphics driver is to support the users of your graphics hardware. If you are actually employed by the vendor, your users are those who bought your hardware and who will buy your hardware again if they are satisfied. This is the business case for providing optimal support for your hardware for a given operating system or infrastructure. On top of that, in open source software, the users are more than just the customers, they are also the testers.

Canonicals plan and marketing seem to have worked out quite well over the years, to the extent that half the planet thinks that linux equals ubuntu, and ubuntu probably has the larger part of the linux desktop market. This means that ubuntu users are a sizable portion of Intels userbase, and as a hardware vendor (and only secondarily a maker of display servers), Intel simply cannot afford to refuse to support or even alienate these users. Canonical has decided that Mir will be the primary display server on future Ubuntu releases, and this in turn means that Intel has an obligation to support Mir.

The Xmir patch to the Intel graphics driver seems rather minimal and not very invasive. There also seems or seemed, as the case may be now, direct communication between Intels graphics driver developers and Ubuntus developers. As Mir will ship on the next Ubuntu versions, there will be a large amount of users which will test the Xmir code in the Intel graphics driver. There is no chance that the Xmir code will bitrot for the foreseeable future, and Intels own investment in this code will be minimal.

The real art of writing good drivers is to provide for quick and painless debugging. Graphics hardware is complex, the drivers for this hardware are also complex, and neither is ever perfect, so one has to work hard to maximize the chance for bug resolution. This means easy communication with users, and giving the user an easy route to test changes so that proper feedback can be provided quickly. If you fail to make it easy enough for users, you will simply not get your bugs fixed, and the higher the resolution threshold becomes, the worse your driver will become.

By not carrying this patch, Intel forces Ubuntu users to only report bugs to Ubuntu, which then means that only few bug reports will filter through to the actual driver developers. At the same time, Ubuntu users cannot simply test upstream code which contains extra debugging or potential fixes. Even worse, if this madness continues, you can imagine Intel stating to its customers that they refuse to fix bugs which only appear under Mir, even though there is a very very high chance of these bugs being real driver bugs which are just exposed by Mir.

The reality of the matter is, Intel is hurting its own graphics driver more than it could potentially hurt Mir or Canonical.

The andriodization of linux

The biggest installed base of Linux is android, and it is bigger by many orders of magnitude. Sadly the linux which we call android is little more than the linux kernel and some new-ish (mostly) open source infrastructure on top. While this, to some extent, is quite the boon for open source software, it also holds a major threat. If we are not careful, we get fully locked hardware. We are only sporadically able to enforce the GPL on the kernel, and we have no chance at all to get open source userspace drivers. This limits the usefulness of the now ubiquitous linux hardware out there, and with the way the desktop and mobile are evolving, this will soon limit the availability of hardware for which more-or-less complete open source is available. On top of that, all those electronics companies that are churning out hardware at an amazing rate, they are either unable to see the advantages of actively contributing to open source, or they are having a very hard time in learning how to do so.

This is exactly why I created the lima driver, and why some other brave souls created their respective GPU reverse engineering projects. We recognized this danger, and are sacrificing a large portion of our lives trying to prevent catastrophe. And even though things are not going as fast or as smooth as we expected, we have come a very long way.

Things took a wrong turn a while back though. In an effort to create a stopgap solution, Jolla developer Munk created libhybris, a wrapper library which allows the usage of android drivers on top of glibc, and thus on a normal linux installation. I find this hack pretty dangerous, as it makes all vendors complacent, and it cements the android way of working and the it makes binary drivers the default. Our biggest open source hopes for mobile; Sailfish, Firefox-OS and Ubuntu-Phone Mir readily embraced this way of working.

I have, so far, not seen anything from either Jolla, The Mozilla Foundation or Canonical, along the lines of active support of the route we have chosen with open ARM GPU drivers, and we've been at it for quite some time now. Those companies are more dependent on open source software than your average android vendor, and know how to do things the open source way, but they have fully embraced the binary drivers built for android only, with no signs of them wanting to change this.

The only reason why I favour Wayland over Mir is that Canonical immediately chose the libhybris route with Mir. Wayland currently has patches for libhybris, so soon Wayland sadly will have sunk to Mirs level as well, from a graphics driver point of view.

Intel employs a small army for their open source software, and specifically for their open source graphics driver. But Intel also has other teams working on graphics drivers, and while I am not certain, I do think that Intel ships binary only drivers on their android devices.

Canonical is happy with using libhybris, but currently would prefer to use a proper graphics driver for their future products. This preference now got significantly reduced. Intel now potentially has driven one of the last big users of open source graphics drivers to exclusively using android binaries as well, seriously reducing the relevance of its own OSTC driver developer team in the process.

The low road

Up until now, intel had the moral high ground in the Wayland versus Mir situation. With the simple decision to revert the Xmir patch, this situation now got reversed.

Well done.
link60 comments|post comment

The lima mesa driver runs es2gears. [Sep. 4th, 2013|12:57 am]
Luc Verhaegen
[Tags|, , , , , , , , ]
[Current Location |Rumpelkammer]
[mood |melancholic]
[music |Lemon Jelly - Ramblin' man]


Progress is slow but steady on the lima mesa driver, mostly because I am not giving lima as much time as I should. I now have working attributes, uniforms, vertex buffers and even some state is being set correctly. Enough to run es2gears. Here is the video on youtube, there is an older capture with a rotating smoothed cube as well.

This lima mesa driver uses the old-school (but contrary to popular belief, definitely not deprecated) mesa infrastructure, which, with surprisingly little work, allows me to run the mali binary drivers compiler, and allows me to build the lima driver externally. Using the binary compiler would not be possible from gallium, and with some improvements to the (intel developed, and thus intel focused) mesa glsl compiler, it seems that we might have a potent compiler for the mali ISAs as well. This way, the task of bringing up mesa on mali is nicely split, and we will be able to debug the command stream work separately from the shader compiler. I believe that we did lose the ability to run gles1 programs until Connors open-gpu-tools is hooked up to the mesa glsl compiler. This is just a small price to pay, in comparison to the size of the hurdle to take when doing everything in one go.

We are running es2gears at 130fps on the A10, and 310fps on the Exynos for a 300x300 render. Resizing is currently broken, or rather not implemented. I will need to split up the PLBU command stream to be able to do that properly under DRI2 (where resizes happen behind the mesa drivers back). The way DRI2, mesa and the sunxifb X driver now work together also means that i have to wait for jobs to finish (and then usleep for good measure), so there is a lot of potential for speed-up as well. I am not sure yet how things will have to be hacked to keep X from copying the region before rendering is absolutely done, i guess that we will have to hack something into UMP and the sunxifb driver. But a solution will be found, and we should see around a 50% increase in framerate from that, and even much much more if we manage to use overlays. Since we are in control of all the code now, we should theoretically be able to squeeze every last bit of performance out of the GPU, a luxury not offered to the users of the binary X11 driver.

I will now continue implementing textures, so that i can run all the limare egl tests. After that i will clean up the code and push it out. This will include patches to common mesa versions (and mesa packages) to allow building lima against them. Resizing and job interleaving will have to wait until after that, so keep your eyes peeled on this space :)

10 years in.

A few weeks ago, it was the 10th anniversary of my first contribution to X (a small display fix to the via driver). I cannot state that this anniversary was a happy event.

Looking back, I cannot believe that i once thought that in open source software, code, design and doing The Right Thing, both technically and morally, were paramount. Open source software is about power, politics, corporate affiliation, and loads and loads of noise. Noise and misinformation always wins over code, no matter how good this code is or how hard you work at it. I have had to learn this several times over.

This is especially true in the case of forks. Not the git clone variation, but the loud, aggressive and very detrimental community kind. While often technical reasons are claimed to be the cause, this is never ever the case. It always is about politics and power. And code always suffers as a result, and this suffering is never a short term thing, especially on big forks. See, a fork always means a big stink, a lot of noise, and a power vacuum, and this attracts a certain kind of individual. These personalities form the basis of the new community together with those who instigated or accelerated the fork. Good code simply has no chance in such an environment. Bad code and bad design tends to hang around for a long long time, and tends to influence (read, limit) the thinking of any new blood that turns up. On top of that, the bad mentality also tends to linger for many many years. Politics and noise continues to take precedence over code and design, for the foreseeable future.

The only thing to do in case of a fork, is to go play somewhere else, somewhere where a major fork hasn't taken place. If you don't, or if you do not go far enough, you will see your work impacted, especially if you are not willing to let yourself be limited by the existing mentality and powerbalance.

Now if only i wasn't such a stubborn bastard ;)
link10 comments|post comment

Old and new limare code, and management overhead... [Jun. 4th, 2013|03:16 am]
Luc Verhaegen
[Tags|, , , , , , , , , , ]
[Current Location |Rumpelkammer]
[mood |chipperchipper]
[music |Hole - Live through this 04 - Asking for it]

I just pushed updated limare code and a fix to ioquake3.

In almost 160 patches, loads of things change:

  • clean FOSDEM code supporting Q3A timedemo on a limare ioquake3.

  • support for r3p2 kernel and binary userspace as found on the odroid-x series.

  • multiple PP support, allowing for the full power of the mali 400MP4 to be used.

  • fully threaded job handling, so new frames can be set up while the first is getting rendered.

  • multiple textures, in rgb888, rgba8888 and rgb565, with mipmapping.

  • multiple programs.

  • attribute and elements buffer support.

  • loads of gl state is now also handled limare style.

  • memory access optimized scan pattern (hilbert) for PP (fragment shader).

  • direct MBS (mali binary shader) loading for pre-compiled shaders (and OGT shaders!!!).

  • support for UMP (arm's in kernel external memory handler).

  • Properly centered companion cube (now it is finally spinning in place :))

  • X11 egl support for tests.

  • ...

Some of this code was already published to allow the immediate use of the OGT enabled ioquake3. But that branch is now going to be removed, as the new code replaces it fully.

As for performance, this is no better or worse than the FOSDEM code. 47fps on timedemo on the Allwinner A10 at 1024x600. But now on the Exynos 4, there are some new numbers... With the CPU clocked to 2GHz and the Mali clocked to 800MHz (!!!) we hit 145fps at 720p and 127fps at 1080p. But more on that a bit further in this post.

Upcoming: Userspace memory management.

Shortly after FOSDEM, i blogged about the 2% performance advantage over the binary driver when running Q3A.

As you might remember, we are using ARMs kernel driver, and despite all the pain that this is causing us due to shifting IOCTL numbers (whoever at ARM decided that IOCTL numbers should be defined as enums should be laid off immediately) I still think that this is a useful strategy. This allows us to immediately throw in the binary driver, and immediately compare Lima to the binary, and either help hard reverse engineering, or just make performance comparisons. Rewriting this kernel driver, or turning this into a fully fledged DRM driver is currently more than just a waste of time, it is actually counterproductive right now.

But now, while bringing up a basic mesa driver, it became clear that I needed to work on some form of memory management. Usually, you have the DRM driver handling all of that, (even for small allocations i think - not that i have checked). We do not have a DRM driver, and I do not intend to write one in the very near future either, and all I have is the big block mapping that the mali kernel driver offers (which is not bad in itself).

So in the train on the way back from linuxtag this year, I wrote up a small binary allocator to divide up the 2GB of address space that the Mali MMU gives us. On top of that, I now have 2 types of memory, sequential and persistent (next to UMP and external, for mapping the destination buffer into Mali memory), and limare can now allocate and map blocks of either at will.

The sequential memory is meant for per-frame data, holding things like draws and varyings and such, stuff that gets thrown away after the frame has been rendered. This simply tracks the amount of memory used, adds the newly requested memory at the end, and returns an address and a pointer. No tracking whatsoever. Very lightweight.

The persistent memory is the standard linked list type, with the overhead that that incurs. But this is ok, as this memory is meant for shaders, textures and attribute and element buffers. You do not create these _every_ draw, and you tend to reuse them, so it's acceptable if their management is a bit less optimized.

Normally, more management makes things worse, but this memory tracking allowed me to sanitize away some frame specific state tracking. Suddenly, Q3A at 720p which originally ran at 145fps on the exynos, ran at 176fps. A full 21% faster. Quite some difference.

I now have a board with a Samsung Exynos 4412 prime. This device has the quad A9s clocked at 1.7GHz, 2GB LP-DDR2 memory at 880MHz, and a quad PP Mali-400MP4 at 440MHz. This is quite the powerhouse compared to the 1GHz single A8 and single PP Mali-400 at 320MHz. Then, this Exynos chip I got actually clocks the A9s to 2GHz and the mali to a whopping 800MHz (81% faster than the base clock). Simply insane.

The trouble with the exynos device, though, is that there are only X11 binaries. This involves a copy of the rendered buffer to the framebuffer which totally kills performance. I cannot properly compare these X11 binaries with my limare code. So I did take my new memory management code to the A10 again, and at 1024x600 it ran the timedemo at 49.5fps. About a 6% margin over the binary framebuffer driver, or tripling my 2% lead at FOSDEM. Not too bad for increased management, right?

Anyway, with the overclocking headroom of the exynos, it was time for a proper round of benchmarking with limare on exynos.

Benchmark, with a pretty picture!

Limare Q3A benchmark results on exynos4412

The above picture, which I quickly threw together manually, maps it out nicely.

Remember, this is an Exynos 4412 prime, with 4 A9s clocked from 1.7-2.0GHz, 2GB LP-DDR2 at 880MHz, and a Mali-400MP4 which clocks from 440MHz to an insane 800MHz. The test is the quake 3 arena timedemo, running on top of limare. Quake 3 Arena is single threaded, so apart from the limare job handling, the other 3 A9 cores simply sit idle. It's sadly the only good test I have, if someone wants to finish the work to port Doom3 to gles, I am sure that many people will really appreciate it.

At 720p, we are fully CPU limited. At some points in the timedemo (as not all scenes put the same load on cpu and/or gpu), the difference in mali clock makes us slightly faster if the cpu can keep up, but this levels out slightly above 533MHz. Everything else is simply scaling linearly with the cpu clock. Every change in cpu clock is a 80% change in framerate. We end up hitting 176.4fps.

At 1080p, it is a different story. 1080p is 2.25 times the amount of screen real estate of 720p (if that number rings a bell, 2.25MB equals two banks of Tseng ET6x00 MDRAM :p). 2.25 times the amount of pixels that need to pushed out. Here clearly the CPU is not the limiting factor. Scaling linearly from the original 91fps at 440MHz is a bit pointless, as the Q3A benchmark is not always stressing CPU and GPU equally over the whole run. I've drawn the continuation of the 440-533MHz increase, and that would lead to 150fps, but instead we run into 135.1fps. I think that we might be stressing the memory subsystem too much. At 135fps, we are pushing over 1GBps out to the framebuffer, this while the display is refreshing at 60fps, so reading in half a gigabyte. And all of this before doing a single texture lookup (of which we have loads).

It is interesting to see the CPU become measurably relevant towards 800MHz. There must be a few frames where the GPU load is such that the faster CPU is making a distinguishable difference. Maybe there is more going on than just memory overload... Maybe in future i will get bored enough to properly implement the mali profiling support of the kernel, so that we can get some actual GP and PP usage information, and not just the time we spent waiting for the kernel job to return.

ARM Management and the Lima driver

I have recently learned, from a very reliable source, that ARM management seriously dislikes the Lima driver project.

To put it nicely, they see no advantage in an open source driver for the Mali, and believe that the Lima driver is already revealing way too much of the internals of the Mali hardware. Plus, their stance is that if they really wanted an open source driver, they could simply open up their own codebase, and be done.


We can debate endlessly about not seeing an advantage to an open source driver for the Mali. In the end ARMs direct customers will decide on that one. I believe that there is already 'a slight bit of' traction for the general concept of open source software, I actually think that a large part of ARMs high margin products depend on that concept right now, and this situation is not going to get any better with ARMv8. Silicon vendors and device makers are also becoming more and more aware of the pain of having to deal with badly integrated code and binary blobs. As Lima becomes more complete, ARMs customers will more and more demand support for the Lima driver from ARM, and ARM gets to repeat that mantra: "We simply do not see the advantage"...

About revealing the internals of the Mali, why would this be an issue? Or, let me rephrase that, what is ARM afraid of?

If they are afraid of IP issues, then the damage was done the second the Mali was poured into silicon and sold. Then the simple fact that ARM is that apprehensive should get IP trolls' mouths watering. Hey IP Trolls! ARM management believes that there are IP issues with the Mali! Here is the rainbow! Start searching for your pot of gold now!

Maybe they are afraid that what is being revealed by the Lima driver is going to help the competition. If that is the case, then it shows that ARM today has very little confidence in the strength of their Mali product or in their own market position. And even if Nvidia or Qualcomm could learn something today, they will only be able to make use of that two years or even further down the line. How exactly is that going to hurt the Mali in the market it is in, where 2 years is an eternity?

If ARM really believes in their Mali product, both in the Mali's competitivity and in the originality of its implementation, then they have no tangible reason to be afraid of revealing anything about its internals.

Then there is the view that ARM could just open source their own driver. Perhaps they could, it really could be that they have had very strict agreements with their partners, and that ARM is free to do what they want with the current Mali codebases. I personally think it is rather unlikely that everything is as watertight as ARM management imagines. And even then, given that they are afraid of IP issues... How certain are ARMs lawyers that nothing contentious slipped into the code over the years? How long will it take ARMs legal department to fully review this code and assess that risk?

The only really feasible solution tends to be a freshly written driver, with a full development history available publically. And if ARM wants to occupy their legal department, then they could try to match intel (AMD started so well, but ATI threw in the towel so quickly, but luckily the AMD GPGPU guys continued part of it), and provide the Technical Reference Manual and other documents to the Mali. That would be much more productive, especially as that will already be more legal overhead than ARM management would be willing to spare, when they do finally end up seeing the light.

So. ARM management hates us. But guess what. Apart from telling us to change our name (there was apparently the "fear" of a trademark issue with us using Remali, so we ended up calling it Lima instead), there was nothing that they could do to stop us a year and a half ago. And there is even less that ARM can do to stop us today :)

A full 6.0%...
link21 comments|post comment

Q3A with open source generated shaders! [Mar. 18th, 2013|08:28 pm]
Luc Verhaegen
[Tags|, , , , , , , , , ]
[Current Location |Rumpelkammer]
[mood |Happy]
[music |Cornershop - Brimful of asha (norman cook remix)]

The combination of limare and open-gpu-tools can now run Quake 3 Arena timedemo without depending on the binary driver for the shader compiler!

Connor Abbott has been being his amazing (16y old!) self again in the weeks after his talk at FOSDEM, and he pushed his compiler work in his open-gpu-tools tree to be able to handle basic vertex shaders. Remember that our vertex shader is a rather insane one, where the compiler has to work real hard on getting scheduling absolutely right. This is why an assembler for our vertex shader was not too useful and the most part of a compiler had to be written for it to generate useful results. A mammoth task, and Connor his vertex shader code is now larger than the code I have in my limare library.

So it was high time that we brought limare and OGT together to see what they were capable of with some basic shaders. Luckily, the Q3A GLES1 emulation has basic shaders, what a nice coincidence :)

So Connor turned my simple vertex shader essl into the high level language used by the OGT vertex shader compiler, and through steps described at this wiki page, turned them into MBS files (Mali Binary Shader - the file type output by the standalone compiler, and also by newer binary driver integrated compilers). Limare can then load and parse those MBS files, and run the shaders. No need to involve the ARM binary anymore when we have OGT generated MBS files :)

The result was quite impressive. We had a few issues where the limare driver (which has mostly taken its cues from the output of the binary driver) and OGT disagreed over symbol layout, but apart from that, bringing up the shaders connor produced was pretty painless. Amazingly effortless, for such a big step.

Connor then spent another day playing with the fragment shader assembler, fixed some bugs, and produced 3 fragment shaders for us. One for the clear shader used by limare directly, and 2 for Q3A. After some more symbol layout issues, these also just worked! We even seem to be error-margin faster with the MBS files (due to texture coordinate varyings being laid out differently).

So this is a really big milestone for the lima driver project. Even with our insane pre-optimized architecture, we now are able to run Quake 3 Arena without any external dependencies, and we are beating the ARM binary while doing so.

For generating your own shader MBS files, check out Connors OGT, and then you can head straight to Connors wiki page. My Q3A tree now has the MBS code included directly. And i pushed a dirty version of my FOSDEM limare code.

As for this new limare code, this fosdem_2013_pile branch will vanish soon, as i need to properly pry things apart still. This is run-for-the-price code, and often includes many unrelated fixes in the same commit. It's better to do archeology on it now, than 3y from now, so this needs to be split. But in the meantime, you all can go and give Q3A on a fully free driver stack on Mali hw a go :)

I will not post a video, as there really is nothing new to see. It is the exact same timedemo, running some promille faster. Build things, and then run it yourself on your sunxi hardware (i am still working on porting it to the new kernel of a more powerful platform). That's the best proof there is!

For building limare, check out the fosdem2013_pile branch and then just run make/make install.

For building Q3A all you need to do is run:
make ARCH=arm USE_LIMARE=1
And, when you have the full quake installed in ~ioquake3/baseq3, you can create a file called ~ioquake3/baseq3/demofour.cfg with the following content:
cg_drawfps 1
timedemo 1
set demodone  "quit"
set demoloop1 "demo four; set nextdemo vstr demodone"
vstr demoloop1
You can then run the ioquake3 binary with "+exec demofour.cfg" added to the command line, and you will have the demo running on top of fully free software!

Now we really have covered all the basics, time to find out how Mesa will play with our plans :)
link19 comments|post comment

Quake 3 Arena timedemo on top of the lima driver! [Feb. 6th, 2013|02:08 am]
Luc Verhaegen
[Tags|, , , , , , , ]
[Current Location |Rumpelkammer]
[mood |accomplished]
[music |Cinematic Orchestra - The revolution]

At FOSDEM, I had a mainline talk about "Open ARM GPU Drivers", going over all the projects and GPUs, talking about the developers doing the hard reverse engineering work and the progress that they have made so far. I will write up a blog entry summarizing this talk soon, but for now i will just talk about the Lima demo i showed at the end of the talk.

Let me get straight to the main point before delving into details: We now have a limare (our proto/research driver) port of Quake 3 Arena which is running the q3a timedemo 2% faster than the binary driver. With 3% less cpu overhead than the binary driver to boot!

Here is the timedemo video up on youtube. It is almost pixel-perfect, with just a few rounding errors introduced due to us being forced to use a slightly different vertex shader (ESSL, pulled through the binary compiler instead of a hand coded shader). We have the exact same tearing as the binary drivers, which are also not synced to display on the linux-sunxi kernel (but ever so slightly more tearing than the original ;)).

This Q3A port is not playable for a few reasons. One is, i threw out the touchscreen input support, but never hacked in the standard SDL based input, so we have no input today. It should be easy to add though. Secondly, i only include the shaders that are needed for running the timedemo. The full game (especially its cut scenes) requires a few more shaders, which are even simpler than the ones currently included. I also need to implement the equivalent of glTexSubImage2d, as that is used by the cut-scenes. So, yes, it is not playable today, but it should be easy to change that :)

We are also not fully open source yet, as we are still using the binary shader compiler. Even after begging extensively, Connor was not willing to "waste time" on hand coding the few shaders needed. He has the necessary knowledge to do so though. So who knows, maybe when i push the code out (the q3a tree is a breeze to clean, but the lima code is a mess, again), he might still give us the few shaders that we need, and we might even gain a few promille performance points still :)

I will first be pushing out the q3a code, so that others can use the dumping code from it for their own GPU reverse engineering projects. The limare code is another hackish mess again (but not as bad as last time round), so cleaning that up will take a bit longer than cleaning up q3a.

Why frag like it is 1999?

Until now, i was mostly grabbing, replaying, and then porting, EGL/GLES2 programs that were specifically written for reverse engineering the mali. These were written by an absolute openGL/openGLES newbie, someone called libv. These tests ended up targetting very specific but far too limited things, and had very little in common with real world usage of the GPU. As most of the basic things were known for mali, it was high time to step up things a level.

So what real world OpenGL(ES) application does one pick then?

Quake 3 Arena of course. The demo four timedemo was the perfect next step for reverse engineering our GPU.

This 1999 first person shooter was very kindly open sourced by ID Software in 2005. Oliver McFadden later provided an openGLES1 port of ioquake3 for the Nokia N900. With the Mali binary providing an OpenGLES1 emulation library, it was relatively easy to get a version going which runs on the Mali binary drivers. Thank you Oliver, you will be missed.

The Q3A engine was written for fixed 3D pipelines and this has some very profound consequences. First, it limits the dependency on the shader compiler and allowed me to focus almost purely on the command stream. This completely fits with the main strategy of our reverse engineering project, namely it being 2 almost completely separate projects in one (command stream versus shader compilers). Secondly, and this was a nice surprise when i started looking at captures, the mali OpenGLES1 implementation had some very hardware specific optimizations that one could never expose with OpenGLES2 directly. Q3A ended up being vastly more educational than I had expected it to be.

With Q3A we also have a good benchmark, allowing us to get a better insight into performance for the first time. And on top of all of that, we get a good visual experience and it is a dead-certain crowdpleaser (and it was, thanks for the cheers guys :))

The only downside is that the data needed to run demo four is not available with the q3a demo release and therefor not freely downloadable. Luckily you can still find Q3A CDs on ebay, and i have heard that steam users can easily download it from there.

The long story

After linuxtag, where i demoed the rotating companion cube, I assumed that my knowledge about the mali was advanced enough that bringing up Q3A would take only a given number of weeks. But as these things usually go, and with work an real life getting in the way, it never pans out like that. January 17th is when i had q3a first work correctly, time enough to worry about some optimization still before FOSDEM, but only just enough.

I started with an android device and the kwaak3 "app", which is just Olivers port with some androidiness added. I captured some frames to find out what i still missed with limare. When i finally had some time available, i first spent it cleaning up the linuxtag code, which i pushed out early december. I had already brought up Q3A on linux-sunxi with the mali binary drivers, which can be seen from the video i then published on youtube.

One thing about the youtube video though... Oliver had a tiny error in his code, one that possibly never did show up on the N900. In his version of the texture loading code, the lightmaps original format would end up being RGB whereas the destination format is RGBA. This difference in format, and in-driver conversion, is not supported by the openGLES standard. This made the mali driver refuse to load the texture, which later on had the driver use only the primary texture, even though a second set of texture coordinates were attached to the command stream. The vertex shader did not reflect this, and in my openGL newbieness i assumed that Ben and Connor had a bug in their vertex shader disassembler. You can clearly see the flat walls in the video i posted. Once i fixed the bug though, q3a suddenly looked a lot more appealing.

I then started with turning the openGLES1 support code in Quake's GLimp layer into a dumper of openGLES1 commands and data in a way that made it easy to replay individual frames. Then i chose some interesting frames, and replayed them, turned them into a GLES2 equivalent (which is not always fully possible, alphaFunc comes to mind), and then improved limare until it ran the given frames nicely through (the mali has hw alphaFunc, so limare is able to do this directly too). Rince and repeat, over several interesting frames.

By the evening of January the 16th, i felt that i knew enough to attempt to write a GLimp for limare. This is exactly when my father decided to give me a call. Some have met him at Le Paon last Friday, when he, to my surprise, joined us for a beer after work as his office is not far away. He remarked that i seemed "a bit on edge" when he called on the 16th. Yes, i indeed was, and how could i be anything else at a time like this :) I hacked all night, as at the time i was living purely at night anyway, and minutes before my girlfriend woke up i gave it my first shot. Crash, a stupid bug in my code. I told my girlfriend that i wouldn't join her for "breakfast" before i went to bed, as i was simply way too close. By the time she left for work, i was able to run until the first few in-game frames, when the rendering would hang, with the mali only coming back several seconds later. After a bit of trying around, i gave the GP (vertex shader) a bit more space for its tile heap. This time it ran for about 800 frames before the same thing happened. I doubled the tile-heap again, and it ran all the way through!

The evening before i had hoped that i would get about 20fps out of this hardware. This already was a pretty cocky and arrogant guess, as the binary driver ran this demo at about 47.3fps, but i felt confident that the hardware had little to hide. And then the demo ran through, and produced a number.


Way beyond my wildest dreams. Almost 65% of the performance of the binary driver. Un-be-liev-ab-le. And this was with plain sequential job handling. Start a GP job, wait for it to finish, then start the PP job, wait for it to finish, then flip. 30.5fps still! Madness!

I had two weeks left for FOSDEM, so i had a choice, either add input support and invite someone from the public to come and play before the audience, or, optimize until we beat the binary driver. The framerate of the first pass decided that, optimization it was. I had a good benchmark, and only a third of the performance needed to be found, and most of the corners for that extra performance were known.

My first optimization was to tackle the PP polygon list block access pattern. During my previous talk at FOSDEM, i explained that this was the only bit I found that might be IP encumbered. In the meantime, over the weekly beers with Michael Matz, the SuSE Labs toolchain lead, i had learned that there is thing called the "hilbert space filling curve". Thanks Matz, that was worth about ~2.2fps. I benchmarked another few patterns: two level hilbert (inside plb block, and out), and the non-rotated hilbert pattern that is used for the textures. None would give us the same performance as the hilbert curve.

Building with -O3 then gave us another 1.5fps. Passing vec2s between the shaders gave us 0.3fps. It was time to put in proper interleaved job handling. With the help of Marcus Meissner (the SuSE Security lead), an ioctl struct sizing issue was found for the job wait thread. This fixed the reliability issues with threading on the r3p0 kernel of linux-sunxi. (ARM! Stable kernel interfaces now!) But thanks Marcus, as proper threading and interleaved job handling put me at 40.7 fps!

And then i got stuck. I only had 40.7fps and knew nothing that could account for such a big gap in performance. I tried a few things left and right, but nothing... I then decided to port q3a to GLES2 (with the loss of alphafunc and buggered up lamps as a result) to see whether our issue was with the compiled versus hand-coded shader. But I quickly ran into an issue with multi-texture program state tracking, which was curious, as the lima code was logically the same. Once this was fixed the GLES2 port ran at about 47.6fps, slightly faster than GLES1, which i think might be because of the lack of alphafunc.

Immediately after that i ported the multi-texture state tracking fix to the limare GLimp, but i sadly got no change in framerate out of it. Strangely, it seemed like there was no multitexturing going as my debugging printfs were not being triggered. I then noticed the flag for telling Q3A that our GL implementation supports multitexturing. Bang. 46.7fps. I simply couldn't believe how stupid that was. If that had been correct on the first run, i would've hit above 75% of the framerate, how insane would that have been :)

For the final 1.5fps, which put us at 48.2fps, i added a third frame, this while only rendering out to two framebuffers. Job done!

Adding a fourth frame did not improve numbers, and i left some minute cpu usage and memory usage optimizations untouched. We are faster than the binary driver, while employing no tricks. We know what we need to know about this chip and there is nothing left to prove with Q3A performance.

The numbers.

The fact that we are slightly faster is actually normal. We do not have to adhere to the OpenGLES standard, we can do without a lot of the checking that a proper driver normally needs to do. This is why the goal was not to match the binary driver's performance, but to beat it, which is exactly what we achieved. From some less PP and CPU bound programs, like the spinning cubes, it does seem that we are more aggressive with scheduling though.

Now let's look at some numbers. Here is the end of the timedemo log for the binary driver, on an Allwinner A10 (single cortex a8, at 1GHz), with a Mali-400MP1 at 320MHz, rendering to a 1024x600 LCD, with framerate printing enabled:
    THEINDIGO^7 hit the fraglimit.
    marty^7 was melted by THEINDIGO^7's plasmagun
    1260 frames 27.3 seconds 46.2 fps 10.0/21.6/50.0/5.6 ms
    ----- CL_Shutdown -----
    RE_Shutdown( 1 )

And here is the end of the timedemo log for the limare port:
    THEINDIGO^7 hit the fraglimit.
    marty^7 was melted by THEINDIGO^7's plasmagun
    ]64f in 1.313632s: 48.719887 fps (1280 at 39.473158 fps)
    1260 frames 26.7 seconds 47.2 fps 9.0/21.2/74.0/5.6 ms
    ----- CL_Shutdown -----
    RE_Shutdown( 1 )
    ]Max frame memory used: 2731/4096kB
    Auxiliary memory used: 13846/16384kB
    Total jobs time: 32.723190 seconds
       GP job  time: 2.075425 seconds
       PP job  time: 39.921429 seconds

Looking at the numbers from the limare driver, my two render threads are seriously overcommitted on the fragment shader (PP). We really are fully fragment shader bound, which is not surprising, as we only have a single fragment shader. Our GP is sitting idle most of the time.

It does seem promising for a quad core mali though. I will now get myself a quad-core A9 SoC, and put that one through its paces. My feeling is that there we will either hit a wall with memory bandwidth or with the CPU, as q3a is single threaded. Since limare does not yet support multiple fragment shaders the last remaining big unknown will get solved too.

Another interesting number is the maximum frame time. 50.0ms for the binary driver, versus 74.0ms for limare. My theory there is that i am scheduling differently than the original driver and that we get hit by us overcommitting the fragment shader. Wait and see whether this difference in scheduling will improve or worsen the numbers on the potentially 4 times faster SoC. We will not be context switching anymore with our render threads, and we will no longer be limited by the fragment shader. This should then decide whether another scheme should be picked or not.

Once we fix up the Allwinner A10 display engine, and can reliably sync to refresh rate, this difference in job scheduling should become mostly irrelevant.

The star: the mali by falanx.

In the previous section i was mostly talking about the strategy of scheduling GP and PP jobs, of which one tends to have 1 of each per frame. Performance optimization is a very high level problem on the mali, which is a luxury. On mali we do not need to bother with highly specific command queue patterns which most optimally use the available resources, which then ends up being SoC and board specific. We are as fast as the original driver without any trickery, and this has absolutely nothing to do with my supposed ability as a hacker. The credit fully goes to the design of the mali. There is simply no random madness with the mali. This chip makes sense.

The mali is the correct mix of the sane and the insane. All the real optimization is baked into the hardware design. The vertex shader is that insane for a reason. There is none of that "We can fix it in software" bullshit going on. The mali just is this fast. And after 20 months of throwing things at the mali, i still have not succeeded in getting the mali to hard or soft lockup the machine. Absolutely amazing.

When i was pretty much the only open source graphics developer who was pushing display support and modesetting forwards, I often had to hear that modesetting was easy, and that 3d is insane. The mali proves this absolutely wrong. Modesetting is a very complex problem to solve correctly, with an almost endless set of combinations that requires very good insight and the ability to properly structure things. If you fail to structure correctly, you have absolutely no chance of satisfying 99.9% of your users, you'll be lucky if you satisfy 60%. Compared to modesetting, 3D is clearly delineated, and it is a vastly more overseeable and managable problem... Provided that your hardware is sane.

The end of the 90s was an absolute bloodbath for graphics hardware vendors with just a few, suddenly big, companies surviving. That's exactly when a few Norwegian demo-sceners, at the Trondheim University, decided that they would do 3D vastly better than those survivors and they formed a company to do so, called Falanx. It must've seemed like suicide, and I am very certain that pretty much everybody declared them absolutely insane (like engadget did). Now, 12 years later, seeing what came out of that, I must say that I have to agree. Falanx was indeed insane, but it was that special kind of insanity that we call pure genius.

You crazy demo-sceners. You rock, and let this Q3a port be my salute to you.
link13 comments|post comment

[ viewing | most recent entries ]