Bad compression ratio for assembler code under crinkler?
category: general [glöplog]
Yup, OpenGL FSAA is just plain huge.
i am glad i don't code as this thread confounds and scares me
Quote:
i am glad i don't code as this thread confounds and scares me
Really? I think we've just scratched the surface...
+1! Great Thread.
obviously more size efficient code will have a worse compression ratio than badly size efficient code. what do you think compression mean?
on another note, what do you think the word "ratio" mean?
Hatikvah: A little harsh...
But what he means is true; compressed SIZE is really what you want to compare...size-optimized code in general will have worse compression ratios than long, repetitive code, so the compression ratio really isn't important in comparisons.
But what he means is true; compressed SIZE is really what you want to compare...size-optimized code in general will have worse compression ratios than long, repetitive code, so the compression ratio really isn't important in comparisons.
Quote:
obviously more size efficient code will have a worse compression ratio than badly size efficient code
Indeed, the title of this thread is confusing.
Thanks Blueberry, this is helpful.
So to summarize and achieve a better compression on ASM code for 4k intros:
1) Need to better know x86 asm with size optimization in mind. At least, this point is for me. ;)
2) Produce a code that is regular, with a fixed instruction set, thus providing predictable redundancy along the whole code.
3) Code the whole demo in ASM is the key point to achieve a good compression, avoiding irregularity between different generated ASM (mixed c++/asm vs pure asm).
I have one more question : i don't feel confident with fpu code and i would rather code with sse instruction set (through xmm0-xmm8 registers), but the fact is fpu instruction set seems to produce shorter code. On the other hand, using sse registers facilitate optimization across a large method (for example for a softsynth).
Do you think sse is not enough instruction-size friendly to use it in a 4k?
Quote:
i actually finished and debugged the kkrunchy 0.23a3 (and following versions) compressor under linuxQuote:pls to release :(
+1
(sorry for thread-jacking, but I felt like I need to endure this)
4) Try to repeat sequences even if it gives you redundant code. As mentioned, preserve registers you don't need to preserve if it makes the push/pop sequences surrounding all subroutines identical. When you do similar things in two places, try to make them identical things, or at least as similar as possible with regards to register usage etc.
5) Repeating sequences pack so well that it very often makes sense to unroll loops to avoid the loop overhead. E.g. in ryg's example:
You may find this produces a smaller executable in the end:
Eh. My x86 is rusty and I didn't actually consider what the code does, but you get the idea. The point is the code for repeating operations is already there in the unpacker, try to rely on that.
6) Make your data more suitable for entropy encoding. For an array of coordinates making up a camera path, say, the distance from one point to the next in the sequence is "small" so store offsets rather than absolute coordinates, and the resulting data will be more densely distributed. Same thing goes for colours across gradients, etc.
7) Don't use data types that are more precise than they need to be. It doesn't necessarily matter if control points along a spline are defined in 8-bit coordinates. It will still look smooth if you interpolate precisely.
And so on.
5) Repeating sequences pack so well that it very often makes sense to unroll loops to avoid the loop overhead. E.g. in ryg's example:
Code:
push -2;
pop ebp;
push esi;
shaderlp:
mov esi, [esp];
lea eax, [ebp+GL_VERTEX_SHADER+1];
push eax;
lodsd;
call eax;
push eax;
push edi;
push eax;
push 0;
push dword ptr [esp+44+ebp*4];
push 1;
push eax;
lodsd;
call eax;
lodsd;
call eax;
lodsd;
call eax;
inc ebp;
jnz shaderlp;
You may find this produces a smaller executable in the end:
Code:
push esi;
mov esi, [esp];
lea eax, [GL_VERTEX_SHADER-1];
push eax;
lodsd;
call eax;
push eax;
push edi;
push eax;
push 0;
push dword ptr [esp+36];
push 1;
push eax;
lodsd;
call eax;
lodsd;
call eax;
lodsd;
call eax;
mov esi, [esp];
lea eax, [GL_VERTEX_SHADER];
push eax;
lodsd;
call eax;
push eax;
push edi;
push eax;
push 0;
push dword ptr [esp+40];
push 1;
push eax;
lodsd;
call eax;
lodsd;
call eax;
lodsd;
call eax;
Eh. My x86 is rusty and I didn't actually consider what the code does, but you get the idea. The point is the code for repeating operations is already there in the unpacker, try to rely on that.
6) Make your data more suitable for entropy encoding. For an array of coordinates making up a camera path, say, the distance from one point to the next in the sequence is "small" so store offsets rather than absolute coordinates, and the resulting data will be more densely distributed. Same thing goes for colours across gradients, etc.
7) Don't use data types that are more precise than they need to be. It doesn't necessarily matter if control points along a spline are defined in 8-bit coordinates. It will still look smooth if you interpolate precisely.
And so on.
@lx, about sse, take into accour that you will probably need sin, cos, tan, exp, pow, fmod, and similar functions to make your objects/camera paths/textures... and you don't have such instructions in sse. In sse you only have +, -, *, /, sqrt, 1/sqrt, min, max. So if you need them you will have to mix sse and fpu code, doing all the time ugly data movement from xmm? to memory and from memory to st0... My guess is that plain fpu code will be smaller.
a bit off-topic but in asmone/pro (for amiga) you can do like this:
what this will do is that it will print the size of foo at compile time which is quite useful when doing size optimizing. Is there something similar around for nasm?
Code:
a:
foo:
; some code
b:
printt "size of foo"
printv b - a
what this will do is that it will print the size of foo at compile time which is quite useful when doing size optimizing. Is there something similar around for nasm?
as far as FPU being difficult/annoying whatever, you really don't usually have to keep track of that much data on the stack at one time...but if that's giving you issues you can always take a pad and paper and keep track of your FPU registers that way. Another thing I used to do in tinycoding (<=256b) was use comments like this:
etc.
Also the crinkler compression report is your best friend. Don't forget this ;)
Code:
fld1 ; st0: 1
fldz ; st0: 0, st1: 1
etc.
Also the crinkler compression report is your best friend. Don't forget this ;)
*pen and paper I mean ;)
emoon:
Code:
times 100 nop
%warning "Total compiled size is:"
%assign size $-$$
%warning size
fr33ke:
Thanks. That wasn't that obvious :) I haven't been coding x86 asm for ages so I was just a bit curious.
Thanks. That wasn't that obvious :) I haven't been coding x86 asm for ages so I was just a bit curious.