Bad compression ratio for assembler code under crinkler?

category: general [glöplog]

Yup, OpenGL FSAA is just plain huge.

added on the 2009-05-02 21:11:31 by ferris

i am glad i don't code as this thread confounds and scares me

added on the 2009-05-02 21:48:43 by bfx

Quote:

i am glad i don't code as this thread confounds and scares me

Really? I think we've just scratched the surface...

+1! Great Thread.

added on the 2009-05-02 22:55:28 by torus

obviously more size efficient code will have a worse compression ratio than badly size efficient code. what do you think compression mean?

added on the 2009-05-03 04:43:58 by Hatikvah

on another note, what do you think the word "ratio" mean?

added on the 2009-05-03 04:44:37 by Hatikvah

Hatikvah: A little harsh...

But what he means is true; compressed SIZE is really what you want to compare...size-optimized code in general will have worse compression ratios than long, repetitive code, so the compression ratio really isn't important in comparisons.

added on the 2009-05-03 04:54:40 by ferris

Quote:

obviously more size efficient code will have a worse compression ratio than badly size efficient code

Indeed, the title of this thread is confusing.

Thanks Blueberry, this is helpful.

So to summarize and achieve a better compression on ASM code for 4k intros:
1) Need to better know x86 asm with size optimization in mind. At least, this point is for me. ;)
2) Produce a code that is regular, with a fixed instruction set, thus providing predictable redundancy along the whole code.
3) Code the whole demo in ASM is the key point to achieve a good compression, avoiding irregularity between different generated ASM (mixed c++/asm vs pure asm).

I have one more question : i don't feel confident with fpu code and i would rather code with sse instruction set (through xmm0-xmm8 registers), but the fact is fpu instruction set seems to produce shorter code. On the other hand, using sse registers facilitate optimization across a large method (for example for a softsynth).
Do you think sse is not enough instruction-size friendly to use it in a 4k?

added on the 2009-05-03 10:25:31 by xoofx

Quote:

i actually finished and debugged the kkrunchy 0.23a3 (and following versions) compressor under linux
Quote:
pls to release :(

+1

(sorry for thread-jacking, but I felt like I need to endure this)

added on the 2009-05-03 10:38:29 by LiraNuna

4) Try to repeat sequences even if it gives you redundant code. As mentioned, preserve registers you don't need to preserve if it makes the push/pop sequences surrounding all subroutines identical. When you do similar things in two places, try to make them identical things, or at least as similar as possible with regards to register usage etc.

5) Repeating sequences pack so well that it very often makes sense to unroll loops to avoid the loop overhead. E.g. in ryg's example:

Code:

      push -2;
      pop ebp;
      push esi;
shaderlp:
      mov esi, [esp];
      lea eax, [ebp+GL_VERTEX_SHADER+1];
      push eax;
      lodsd;
      call eax;
      
      push eax;
      push edi;
      push eax;
      push 0;
      push dword ptr [esp+44+ebp*4];
      push 1;
      push eax;
      
      lodsd;
      call eax;
      lodsd;
      call eax;
      lodsd;
      call eax;
      inc ebp;
      jnz shaderlp;

You may find this produces a smaller executable in the end:

Code:

      push esi;

      mov esi, [esp];
      lea eax, [GL_VERTEX_SHADER-1];
      push eax;
      lodsd;
      call eax;
      
      push eax;
      push edi;
      push eax;
      push 0;
      push dword ptr [esp+36];
      push 1;
      push eax;
      
      lodsd;
      call eax;
      lodsd;
      call eax;
      lodsd;
      call eax;

      mov esi, [esp];
      lea eax, [GL_VERTEX_SHADER];
      push eax;
      lodsd;
      call eax;
      
      push eax;
      push edi;
      push eax;
      push 0;
      push dword ptr [esp+40];
      push 1;
      push eax;
      
      lodsd;
      call eax;
      lodsd;
      call eax;
      lodsd;
      call eax;

Eh. My x86 is rusty and I didn't actually consider what the code does, but you get the idea. The point is the code for repeating operations is already there in the unpacker, try to rely on that.

6) Make your data more suitable for entropy encoding. For an array of coordinates making up a camera path, say, the distance from one point to the next in the sequence is "small" so store offsets rather than absolute coordinates, and the resulting data will be more densely distributed. Same thing goes for colours across gradients, etc.

7) Don't use data types that are more precise than they need to be. It doesn't necessarily matter if control points along a spline are defined in 8-bit coordinates. It will still look smooth if you interpolate precisely.

And so on.

added on the 2009-05-03 11:51:14 by doomdoom

@lx, about sse, take into accour that you will probably need sin, cos, tan, exp, pow, fmod, and similar functions to make your objects/camera paths/textures... and you don't have such instructions in sse. In sse you only have +, -, *, /, sqrt, 1/sqrt, min, max. So if you need them you will have to mix sse and fpu code, doing all the time ugly data movement from xmm? to memory and from memory to st0... My guess is that plain fpu code will be smaller.

added on the 2009-05-03 11:58:00 by iq

a bit off-topic but in asmone/pro (for amiga) you can do like this:

Code:


a:

foo:
       ; some code

b:
    printt "size of foo"
    printv b - a

what this will do is that it will print the size of foo at compile time which is quite useful when doing size optimizing. Is there something similar around for nasm?

added on the 2009-05-03 17:28:20 by emoon

as far as FPU being difficult/annoying whatever, you really don't usually have to keep track of that much data on the stack at one time...but if that's giving you issues you can always take a pad and paper and keep track of your FPU registers that way. Another thing I used to do in tinycoding (<=256b) was use comments like this:

Code:


fld1 ; st0: 1
fldz ; st0: 0, st1: 1

etc.

Also the crinkler compression report is your best friend. Don't forget this ;)

added on the 2009-05-03 19:53:34 by ferris

*pen and paper I mean ;)

added on the 2009-05-03 19:53:59 by ferris

emoon:

Code:

times 100 nop
%warning "Total compiled size is:"
%assign size $-$$
%warning size

added on the 2009-05-03 20:17:00 by fr33ke

fr33ke:

Thanks. That wasn't that obvious :) I haven't been coding x86 asm for ages so I was just a bit curious.

added on the 2009-05-03 21:11:00 by emoon

added on the 2009-05-06 13:40:03 by 24

pouët.net

Bad compression ratio for assembler code under crinkler?

login