Log in

No account? Create an account
Old and new limare code, and management overhead... - LIBV Intentionally Breaks Videodrivers [entries|archive|friends|userinfo]
Luc Verhaegen

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Old and new limare code, and management overhead... [Jun. 4th, 2013|03:16 am]
Luc Verhaegen
[Tags|, , , , , , , , , , ]
[Current Location |Rumpelkammer]
[mood |chipperchipper]
[music |Hole - Live through this 04 - Asking for it]

I just pushed updated limare code and a fix to ioquake3.

In almost 160 patches, loads of things change:

  • clean FOSDEM code supporting Q3A timedemo on a limare ioquake3.

  • support for r3p2 kernel and binary userspace as found on the odroid-x series.

  • multiple PP support, allowing for the full power of the mali 400MP4 to be used.

  • fully threaded job handling, so new frames can be set up while the first is getting rendered.

  • multiple textures, in rgb888, rgba8888 and rgb565, with mipmapping.

  • multiple programs.

  • attribute and elements buffer support.

  • loads of gl state is now also handled limare style.

  • memory access optimized scan pattern (hilbert) for PP (fragment shader).

  • direct MBS (mali binary shader) loading for pre-compiled shaders (and OGT shaders!!!).

  • support for UMP (arm's in kernel external memory handler).

  • Properly centered companion cube (now it is finally spinning in place :))

  • X11 egl support for tests.

  • ...

Some of this code was already published to allow the immediate use of the OGT enabled ioquake3. But that branch is now going to be removed, as the new code replaces it fully.

As for performance, this is no better or worse than the FOSDEM code. 47fps on timedemo on the Allwinner A10 at 1024x600. But now on the Exynos 4, there are some new numbers... With the CPU clocked to 2GHz and the Mali clocked to 800MHz (!!!) we hit 145fps at 720p and 127fps at 1080p. But more on that a bit further in this post.

Upcoming: Userspace memory management.

Shortly after FOSDEM, i blogged about the 2% performance advantage over the binary driver when running Q3A.

As you might remember, we are using ARMs kernel driver, and despite all the pain that this is causing us due to shifting IOCTL numbers (whoever at ARM decided that IOCTL numbers should be defined as enums should be laid off immediately) I still think that this is a useful strategy. This allows us to immediately throw in the binary driver, and immediately compare Lima to the binary, and either help hard reverse engineering, or just make performance comparisons. Rewriting this kernel driver, or turning this into a fully fledged DRM driver is currently more than just a waste of time, it is actually counterproductive right now.

But now, while bringing up a basic mesa driver, it became clear that I needed to work on some form of memory management. Usually, you have the DRM driver handling all of that, (even for small allocations i think - not that i have checked). We do not have a DRM driver, and I do not intend to write one in the very near future either, and all I have is the big block mapping that the mali kernel driver offers (which is not bad in itself).

So in the train on the way back from linuxtag this year, I wrote up a small binary allocator to divide up the 2GB of address space that the Mali MMU gives us. On top of that, I now have 2 types of memory, sequential and persistent (next to UMP and external, for mapping the destination buffer into Mali memory), and limare can now allocate and map blocks of either at will.

The sequential memory is meant for per-frame data, holding things like draws and varyings and such, stuff that gets thrown away after the frame has been rendered. This simply tracks the amount of memory used, adds the newly requested memory at the end, and returns an address and a pointer. No tracking whatsoever. Very lightweight.

The persistent memory is the standard linked list type, with the overhead that that incurs. But this is ok, as this memory is meant for shaders, textures and attribute and element buffers. You do not create these _every_ draw, and you tend to reuse them, so it's acceptable if their management is a bit less optimized.

Normally, more management makes things worse, but this memory tracking allowed me to sanitize away some frame specific state tracking. Suddenly, Q3A at 720p which originally ran at 145fps on the exynos, ran at 176fps. A full 21% faster. Quite some difference.

I now have a board with a Samsung Exynos 4412 prime. This device has the quad A9s clocked at 1.7GHz, 2GB LP-DDR2 memory at 880MHz, and a quad PP Mali-400MP4 at 440MHz. This is quite the powerhouse compared to the 1GHz single A8 and single PP Mali-400 at 320MHz. Then, this Exynos chip I got actually clocks the A9s to 2GHz and the mali to a whopping 800MHz (81% faster than the base clock). Simply insane.

The trouble with the exynos device, though, is that there are only X11 binaries. This involves a copy of the rendered buffer to the framebuffer which totally kills performance. I cannot properly compare these X11 binaries with my limare code. So I did take my new memory management code to the A10 again, and at 1024x600 it ran the timedemo at 49.5fps. About a 6% margin over the binary framebuffer driver, or tripling my 2% lead at FOSDEM. Not too bad for increased management, right?

Anyway, with the overclocking headroom of the exynos, it was time for a proper round of benchmarking with limare on exynos.

Benchmark, with a pretty picture!

Limare Q3A benchmark results on exynos4412

The above picture, which I quickly threw together manually, maps it out nicely.

Remember, this is an Exynos 4412 prime, with 4 A9s clocked from 1.7-2.0GHz, 2GB LP-DDR2 at 880MHz, and a Mali-400MP4 which clocks from 440MHz to an insane 800MHz. The test is the quake 3 arena timedemo, running on top of limare. Quake 3 Arena is single threaded, so apart from the limare job handling, the other 3 A9 cores simply sit idle. It's sadly the only good test I have, if someone wants to finish the work to port Doom3 to gles, I am sure that many people will really appreciate it.

At 720p, we are fully CPU limited. At some points in the timedemo (as not all scenes put the same load on cpu and/or gpu), the difference in mali clock makes us slightly faster if the cpu can keep up, but this levels out slightly above 533MHz. Everything else is simply scaling linearly with the cpu clock. Every change in cpu clock is a 80% change in framerate. We end up hitting 176.4fps.

At 1080p, it is a different story. 1080p is 2.25 times the amount of screen real estate of 720p (if that number rings a bell, 2.25MB equals two banks of Tseng ET6x00 MDRAM :p). 2.25 times the amount of pixels that need to pushed out. Here clearly the CPU is not the limiting factor. Scaling linearly from the original 91fps at 440MHz is a bit pointless, as the Q3A benchmark is not always stressing CPU and GPU equally over the whole run. I've drawn the continuation of the 440-533MHz increase, and that would lead to 150fps, but instead we run into 135.1fps. I think that we might be stressing the memory subsystem too much. At 135fps, we are pushing over 1GBps out to the framebuffer, this while the display is refreshing at 60fps, so reading in half a gigabyte. And all of this before doing a single texture lookup (of which we have loads).

It is interesting to see the CPU become measurably relevant towards 800MHz. There must be a few frames where the GPU load is such that the faster CPU is making a distinguishable difference. Maybe there is more going on than just memory overload... Maybe in future i will get bored enough to properly implement the mali profiling support of the kernel, so that we can get some actual GP and PP usage information, and not just the time we spent waiting for the kernel job to return.

ARM Management and the Lima driver

I have recently learned, from a very reliable source, that ARM management seriously dislikes the Lima driver project.

To put it nicely, they see no advantage in an open source driver for the Mali, and believe that the Lima driver is already revealing way too much of the internals of the Mali hardware. Plus, their stance is that if they really wanted an open source driver, they could simply open up their own codebase, and be done.


We can debate endlessly about not seeing an advantage to an open source driver for the Mali. In the end ARMs direct customers will decide on that one. I believe that there is already 'a slight bit of' traction for the general concept of open source software, I actually think that a large part of ARMs high margin products depend on that concept right now, and this situation is not going to get any better with ARMv8. Silicon vendors and device makers are also becoming more and more aware of the pain of having to deal with badly integrated code and binary blobs. As Lima becomes more complete, ARMs customers will more and more demand support for the Lima driver from ARM, and ARM gets to repeat that mantra: "We simply do not see the advantage"...

About revealing the internals of the Mali, why would this be an issue? Or, let me rephrase that, what is ARM afraid of?

If they are afraid of IP issues, then the damage was done the second the Mali was poured into silicon and sold. Then the simple fact that ARM is that apprehensive should get IP trolls' mouths watering. Hey IP Trolls! ARM management believes that there are IP issues with the Mali! Here is the rainbow! Start searching for your pot of gold now!

Maybe they are afraid that what is being revealed by the Lima driver is going to help the competition. If that is the case, then it shows that ARM today has very little confidence in the strength of their Mali product or in their own market position. And even if Nvidia or Qualcomm could learn something today, they will only be able to make use of that two years or even further down the line. How exactly is that going to hurt the Mali in the market it is in, where 2 years is an eternity?

If ARM really believes in their Mali product, both in the Mali's competitivity and in the originality of its implementation, then they have no tangible reason to be afraid of revealing anything about its internals.

Then there is the view that ARM could just open source their own driver. Perhaps they could, it really could be that they have had very strict agreements with their partners, and that ARM is free to do what they want with the current Mali codebases. I personally think it is rather unlikely that everything is as watertight as ARM management imagines. And even then, given that they are afraid of IP issues... How certain are ARMs lawyers that nothing contentious slipped into the code over the years? How long will it take ARMs legal department to fully review this code and assess that risk?

The only really feasible solution tends to be a freshly written driver, with a full development history available publically. And if ARM wants to occupy their legal department, then they could try to match intel (AMD started so well, but ATI threw in the towel so quickly, but luckily the AMD GPGPU guys continued part of it), and provide the Technical Reference Manual and other documents to the Mali. That would be much more productive, especially as that will already be more legal overhead than ARM management would be willing to spare, when they do finally end up seeing the light.

So. ARM management hates us. But guess what. Apart from telling us to change our name (there was apparently the "fear" of a trademark issue with us using Remali, so we ended up calling it Lima instead), there was nothing that they could do to stop us a year and a half ago. And there is even less that ARM can do to stop us today :)

A full 6.0%...

[User Picture]From: tanner_of_kha
2013-06-04 04:17 am (UTC)
Impressive work!

Out of curiosity, what A10 board or device you are using?
(Reply) (Thread)
From: (Anonymous)
2013-06-05 10:55 am (UTC)
Odroid-X2 probably.
(Reply) (Parent) (Thread)
[User Picture]From: libv
2013-06-05 11:04 am (UTC)
I am using an A7HD (http://linux-sunxi.org/Hyundai_A7HD) as my mobile mali development system. But any device where you can bring up the linux-sunxi u-boot and kernel is going to work well.
(Reply) (Parent) (Thread)
From: (Anonymous)
2013-06-06 01:19 pm (UTC)
Woot, I always dreamed to get rid of these damn ARM blobs on my A10 board. These blobs are the worst part of dealing ARM based CPUs. If ARM management failt to recognize that, they surely must fire some management nuts and hire some adequate ones. Thanks for trying to make it happen.
(Reply) (Parent) (Thread)
From: (Anonymous)
2013-06-11 03:28 am (UTC)
Exactly. These management turds are dead weight. Every engineering team I've ever known has spent ~50% of their energy "managing from below" to try and stop the jumped up, "grand vision" pricks in management doing something stupid.
(Reply) (Parent) (Thread)
From: (Anonymous)
2013-06-04 08:04 am (UTC)
I wonder how Intel's Bay Trail will affect ARM's view of the importance of open source drivers?

I'm seeing pricing for the 5W Bay Trail that puts it firmly in contention against ARM-based competitors. If it's as good a SoC as the documentation suggests, ARM's remaining advantage is that the mobile ecosystem is currently strongly ARM-based - will that be enough to keep them in the money?
(Reply) (Thread)
[User Picture]From: libv
2013-06-05 11:10 am (UTC)
ARM is going to have to start reconsidering their position towards intel. Intel SoCs are seriously catching up, and intel graphics have a massive team of open source developers working on it, making it the best supported and easiest to integrate GPU out there. This will cost ARM a lot of sales, and even samsung, arguably ARMs highest profile customer, is now starting to sell intel based mobile devices...

Intel is coming, in more than one way, and ARM just _completely_ dropped the ball on graphics drivers. Will ARM management manage to stave off this full-on attack? From what I witnessed, I am starting to doubt it. And I have now given them enough opportunity to play nice with open source software, from now on I consider ARM as more-or-less actively hostile towards free software.
(Reply) (Parent) (Thread)
From: (Anonymous)
2013-06-06 01:16 pm (UTC)
> and intel graphics have a massive team of open source developers working on it
That's why competition matters. ARM guys have grown really arrogant and don't afraid to expose hostility to those who actually uses their solution and develops new things based on their processors. Isn't it silly?
(Reply) (Parent) (Thread)
From: (Anonymous)
2013-06-07 09:59 am (UTC)
ARM's original parent company (Acorn Computers Ltd) would probably still be around had they not been hostile to OSS, instead they drove themselves out of business pursuing propriety (STB) software dreams all while an active and vibrant free software community (of whilch I was a member) could have done much of their software development for them, while they turned out desirable ARM based hardware. Porting Linux to ARM (which was probably key to ARM's current success) was a community project, following on from the porting of NetBSD/arm aka RiscBSD. ARM's current management probably carry this legacy on to some extent, but I'm sure it doesn't reflect the views of the CPU designers and engineers who do work with the OSS community.
(Reply) (Parent) (Thread)
From: (Anonymous)
2013-06-11 03:31 am (UTC)
If Intel start nailing their power issues (read: I don't want a noisy fan in any device within hearing range), I will forget about ARM. Intel certainly haven't got a spotless past but their open source contributions are duly noted and very much appreciated.
(Reply) (Parent) (Thread)
From: (Anonymous)
2013-06-06 01:13 pm (UTC)
> ARM management seriously dislikes the Lima driver project.
And ARM management f...ks should learn that we're (those who actually deals with ARM devboards) are seriously dislike their damn awkward blobs. Their moron blobs don't even allow to build system as desired. So, dear ARM, Linux is major part of business. Just get off and don't interfere. Or, better, help us to have fun.
(Reply) (Thread)
From: shmerl.blogspot.com
2013-06-06 05:22 pm (UTC)
Since this driver isn't DRM based, will Wayland benefit from it in any way? It would be great to run Wayland based distros on hardware with Mali GPUs.
(Reply) (Thread)
[User Picture]From: libv
2013-06-08 04:27 pm (UTC)
The driver will be mesa/dri based, and there is a drm shim for the kernel driver. Wayland should be perfectly happy with that.
(Reply) (Parent) (Thread)
From: (Anonymous)
2013-06-06 07:19 pm (UTC)
Hi Luc,

Thanks so much for your work. Could you please include Gentoo support into Makefile.inc?

else ifeq ($(triplet), armv7a-hardfloat-linux-gnueabi)
OS = gnu


However, I cannot compile it. It fails during compilation of one of the tests:

make[3]: Entering directory `/root/lima/limare/tests/triangle_smoothed'
gcc -O0 -g -Wall -I../../../include -I../../lib/ -I../common -c -MD -o ../common/egl_common.o ../common/egl_common.c
../common/egl_common.c:51:27: error: array type has incomplete element type
make[3]: *** [../common/egl_common.o] Error 1
make[3]: Leaving directory `/root/lima/limare/tests/triangle_smoothed'
make[2]: *** [triangle_smoothed] Error 2
make[2]: Leaving directory `/root/lima/limare/tests'

Maybe you have a hint for me.

Thanks again!
(Reply) (Thread)
[User Picture]From: libv
2013-06-08 04:26 pm (UTC)
What EGL headers are you using? Check out the eglplatform.h in the sunxi-mali repository.
(Reply) (Parent) (Thread)
[User Picture]From: Michal Lazo
2013-06-07 08:10 am (UTC)
Can you make some guide how we can test or use current version
I am on amlogic SOC with mali400
I want to test for example Qt5 :)
(Reply) (Thread)
From: (Anonymous)
2013-06-07 08:40 am (UTC)
ARM Management: Fuck you!
(Reply) (Thread)
From: ext_1979100
2013-06-07 04:48 pm (UTC)
Too bad you think we shouldn't have a proper kernel driver at this moment. The existing Mali kernel module will certainly not go upstream and I see this as an excuse for everyone to think that having no upstream GPU kernel driver is an acceptable situation.

If we had a upstream DRM driver for Lima, we (kernel developers) would be in a better position to tell others to do the same, and we could ensure that upstream kernels work fine with Lima but not the proprietary drivers.
(Reply) (Thread)
[User Picture]From: libv
2013-06-08 04:25 pm (UTC)
Read up, i talked about this topic many many many times.
(Reply) (Parent) (Thread)
From: (Anonymous)
2013-06-08 04:41 pm (UTC)
ARM may not appreciate what you are doing, but there are many of us that do. Keep up the good work!
(Reply) (Thread)
[User Picture]From: rappy811
2014-01-05 09:37 pm (UTC)
Не помню где я уже встречал почти такую же заметку да ладно
(Reply) (Thread)