maandag 30 januari 2017

JIT: Field of interface type and single implementation.

In my previous post the topic was devirtualization; an optimization applied by the JIT to reduce the overhead of virtual method calls.

In this post I'll analyze what the implications are for virtual method calls on a field with an interface based type and a single loaded implementation of this interface. This is very common in a lot of applications and considered a good practice e.g. to simplify testing by stubs/mocking the dependency. However in production there will only be a single loaded implementation of this interface.

For this post I'll use the following interface. It has a single method 'size' that returns an arbitrary value.
interface Service {
    int size();
}
There is also a very basic implementation with a 'size' method that can easily be inlined:
class ServiceImpl implements Service {
    private int size = 1;

    @Override
    public int size() {
        return size;
    }
}
And we have the following program:
public class SingleImplementation {

    private final Service service;

    public SingleImplementation(Service service) {
        this.service = service;
    }

    public int sizePlusOne() {
        return service.size() + 1;
    }

    public static void main(String[] args) {
        SingleImplementation f = new SingleImplementation(new ServiceImpl());

        int result = 0;

        for (int k = 0; k < 100_000; k++) {
            result += f.sizePlusOne();
        }

        System.out.println("result:" + result);
    }
}
The focus will be on the 'sizePlusOne' method. The logic isn't very exciting, but it will deliver easy to understand Assembly.

For this post we have the following assumptions:

  • we only care about the output of the C2 compiler
  • we are using Java hotspot 1.8.0_91

Assembly

The question is what if there is any price to pay for using an interface as a field type.

To determine this we'll be outputting the Assembly using the following JVM settings:

-XX:+UnlockDiagnosticVMOptions
-XX:PrintAssemblyOptions=intel
-XX:-TieredCompilation
-XX:-Inline
-XX:CompileCommand=print,*SingleImplementation.sizePlusOne
Tiered compilation is disabled since we only care for the C2 Assembly. Inlining is disabled so we can focus on the 'sizePlusOne' method instead of it being lined in the 'main' method and the JIT eliminating the whole loop and calculating the end value.

After running the program the following Assembly is emitted:

Compiled method (c2)     169    8             com.liskov.SingleImplementation::sizePlusOne (12 bytes)
 total in heap  [0x000000010304d0d0,0x000000010304d3a8] = 728
 relocation     [0x000000010304d1f0,0x000000010304d208] = 24
 main code      [0x000000010304d220,0x000000010304d2a0] = 128
 stub code      [0x000000010304d2a0,0x000000010304d2b8] = 24
 oops           [0x000000010304d2b8,0x000000010304d2c0] = 8
 metadata       [0x000000010304d2c0,0x000000010304d2d0] = 16
 scopes data    [0x000000010304d2d0,0x000000010304d300] = 48
 scopes pcs     [0x000000010304d300,0x000000010304d390] = 144
 dependencies   [0x000000010304d390,0x000000010304d398] = 8
 nul chk table  [0x000000010304d398,0x000000010304d3a8] = 16
Loaded disassembler from /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/lib/hsdis-amd64.dylib
Decoding compiled method 0x000000010304d0d0:
Code:
[Disassembling for mach='i386:x86-64']
[Entry Point]
[Constants]
  # {method} {0x00000001cb5b7478} 'sizePlusOne' '()I' in 'com/liskov/SingleImplementation'
  #           [sp+0x20]  (sp of caller)
  0x000000010304d220: mov    r10d,DWORD PTR [rsi+0x8]
  0x000000010304d224: shl    r10,0x3
  0x000000010304d228: cmp    rax,r10
  0x000000010304d22b: jne    0x0000000103019b60  ;   {runtime_call}
  0x000000010304d231: data32 xchg ax,ax
  0x000000010304d234: nop    DWORD PTR [rax+rax*1+0x0]
  0x000000010304d23c: data32 data32 xchg ax,ax
[Verified Entry Point]
  0x000000010304d240: mov    DWORD PTR [rsp-0x14000],eax
  0x000000010304d247: push   rbp
  0x000000010304d248: sub    rsp,0x10           ;*synchronization entry
                                                ; - com.liskov.SingleImplementation::sizePlusOne@-1 (line 12)

  0x000000010304d24c: mov    r11d,DWORD PTR [rsi+0xc]  ;*getfield service
                                                ; - com.liskov.SingleImplementation::sizePlusOne@1 (line 12)

  0x000000010304d250: mov    r10d,DWORD PTR [r12+r11*8+0x8]
                                                ; implicit exception: dispatches to 0x000000010304d289
  0x000000010304d255: cmp    r10d,0x31642b42    ;   {metadata('com/liskov/ServiceImpl')}
  0x000000010304d25c: jne    0x000000010304d274
  0x000000010304d25e: lea    r10,[r12+r11*8]    ;*invokeinterface size
                                                ; - com.liskov.SingleImplementation::sizePlusOne@4 (line 12)

  0x000000010304d262: mov    eax,DWORD PTR [r10+0xc]
  0x000000010304d266: inc    eax                ;*iadd
                                                ; - com.liskov.SingleImplementation::sizePlusOne@10 (line 12)

  0x000000010304d268: add    rsp,0x10
  0x000000010304d26c: pop    rbp
  0x000000010304d26d: test   DWORD PTR [rip+0xfffffffffe7b1d8d],eax        # 0x00000001017ff000
                                                ;   {poll_return}
  0x000000010304d273: ret    
  0x000000010304d274: mov    esi,0xffffffde
  0x000000010304d279: mov    ebp,r11d
  0x000000010304d27c: data32 xchg ax,ax
  0x000000010304d27f: call   0x000000010301b120  ; OopMap{rbp=NarrowOop off=100}
                                                ;*invokeinterface size
                                                ; - com.liskov.SingleImplementation::sizePlusOne@4 (line 12)
                                                ;   {runtime_call}
  0x000000010304d284: call   0x0000000102440e44  ;   {runtime_call}
  0x000000010304d289: mov    esi,0xfffffff6
  0x000000010304d28e: nop
  0x000000010304d28f: call   0x000000010301b120  ; OopMap{off=116}
                                                ;*invokeinterface size
                                                ; - com.liskov.SingleImplementation::sizePlusOne@4 (line 12)
                                                ;   {runtime_call}
  0x000000010304d294: call   0x0000000102440e44  ;*invokeinterface size
                                                ; - com.liskov.SingleImplementation::sizePlusOne@4 (line 12)
                                                ;   {runtime_call}
  0x000000010304d299: hlt    
  0x000000010304d29a: hlt    
  0x000000010304d29b: hlt    
  0x000000010304d29c: hlt    
  0x000000010304d29d: hlt    
  0x000000010304d29e: hlt    
  0x000000010304d29f: hlt    
[Exception Handler]
[Stub Code]
  0x000000010304d2a0: jmp    0x000000010303ff60  ;   {no_reloc}
[Deopt Handler Code]
  0x000000010304d2a5: call   0x000000010304d2aa
  0x000000010304d2aa: sub    QWORD PTR [rsp],0x5
  0x000000010304d2af: jmp    0x000000010301ad00  ;   {runtime_call}
  0x000000010304d2b4: hlt    
  0x000000010304d2b5: hlt    
  0x000000010304d2b6: hlt    
  0x000000010304d2b7: hlt    
OopMapSet contains 2 OopMaps

#0 
OopMap{rbp=NarrowOop off=100}
#1 
OopMap{off=116}

The following code is the relevant Assembly:

  0x000000010304d24c: mov    r11d,DWORD PTR [rsi+0xc]  
    ; loads the 'service' field in r11d.
  0x000000010304d250: mov    r10d,DWORD PTR [r12+r11*8+0x8]
    ; loads the address of the class of the service object into r10d
  0x000000010304d255: cmp    r10d,0x31642b42   
    ; compare the the address of the class with address ServiceImpl class
  0x000000010304d25c: jne    0x000000010304d274
    ; if it isn't of type ServiceImpl, jump to the uncommon trap
  0x000000010304d25e: lea    r10,[r12+r11*8]    ;*invokeinterface size
                                                ; - com.liskov.SingleImplementation::sizePlusOne@4 (line 12)

  0x000000010304d262: mov    eax,DWORD PTR [r10+0xc]
    ; copy the size field into eax
  0x000000010304d266: inc    eax      
    ; add 1 to eax
We can conclude that the JIT has removed the virtual call and has even inlined the ServiceImpl.size method. However what we can also see a typeguard. This is a bit surprising because there is only a single Service implementation loaded. If a conflicting class would be loaded in the future, the class hierarchy analysis should have spotted this and trigger a code deoptimization. Therefore there should not be a need for a guard and a uncommon trap.

Abstract class to the rescue

I have posted the question why this typeguard was added on the hotspot compiler dev mailinglist. And luckily Aleksey Shipilev pointed out the cause of the problem: class hierarchy analysis on Hotspot doesn't deal with interfaces correctly.

Therefor I switched to an abstract class to see if the JIT is able to get rid of this typeguard.

abstract class Service {
   abstract int size();
}

class ServiceImpl extends Service {
    private int size = 1;

    @Override
    public int size() {
        return size;
    }
}
If the program is rerun, the following Assembly is emitted:
Compiled method (c2)     186    8             com.liskov.SingleImplementation::sizePlusOne (10 bytes)
 total in heap  [0x000000010563c250,0x000000010563c4d0] = 640
 relocation     [0x000000010563c370,0x000000010563c380] = 16
 main code      [0x000000010563c380,0x000000010563c3e0] = 96
 stub code      [0x000000010563c3e0,0x000000010563c3f8] = 24
 oops           [0x000000010563c3f8,0x000000010563c400] = 8
 metadata       [0x000000010563c400,0x000000010563c420] = 32
 scopes data    [0x000000010563c420,0x000000010563c448] = 40
 scopes pcs     [0x000000010563c448,0x000000010563c4b8] = 112
 dependencies   [0x000000010563c4b8,0x000000010563c4c0] = 8
 nul chk table  [0x000000010563c4c0,0x000000010563c4d0] = 16
Loaded disassembler from /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/lib/hsdis-amd64.dylib
Decoding compiled method 0x000000010563c250:
Code:
[Disassembling for mach='i386:x86-64']
[Entry Point]
[Constants]
  # {method} {0x000000010d964478} 'sizePlusOne' '()I' in 'com/liskov/SingleImplementation'
  #           [sp+0x20]  (sp of caller)
  0x000000010563c380: mov    r10d,DWORD PTR [rsi+0x8]
  0x000000010563c384: shl    r10,0x3
  0x000000010563c388: cmp    rax,r10
  0x000000010563c38b: jne    0x0000000105607b60  ;   {runtime_call}
  0x000000010563c391: data32 xchg ax,ax
  0x000000010563c394: nop    DWORD PTR [rax+rax*1+0x0]
  0x000000010563c39c: data32 data32 xchg ax,ax
[Verified Entry Point]
  0x000000010563c3a0: mov    DWORD PTR [rsp-0x14000],eax
  0x000000010563c3a7: push   rbp
  0x000000010563c3a8: sub    rsp,0x10           ;*synchronization entry
                                                ; - com.liskov.SingleImplementation::sizePlusOne@-1 (line 12)

  0x000000010563c3ac: mov    r11d,DWORD PTR [rsi+0xc]  ;*getfield service
                                                ; - com.liskov.SingleImplementation::sizePlusOne@1 (line 12)

  0x000000010563c3b0: mov    eax,DWORD PTR [r12+r11*8+0xc]
                                                ; implicit exception: dispatches to 0x000000010563c3c3
  0x000000010563c3b5: inc    eax                ;*iadd
                                                ; - com.liskov.SingleImplementation::sizePlusOne@8 (line 12)

  0x000000010563c3b7: add    rsp,0x10
  0x000000010563c3bb: pop    rbp
  0x000000010563c3bc: test   DWORD PTR [rip+0xfffffffffe790c3e],eax        # 0x0000000103dcd000
                                                ;   {poll_return}
  0x000000010563c3c2: ret    
  0x000000010563c3c3: mov    esi,0xfffffff6
  0x000000010563c3c8: data32 xchg ax,ax
  0x000000010563c3cb: call   0x0000000105609120  ; OopMap{off=80}
                                                ;*invokevirtual size
                                                ; - com.liskov.SingleImplementation::sizePlusOne@4 (line 12)
                                                ;   {runtime_call}
  0x000000010563c3d0: call   0x0000000104a40e44  ;*invokevirtual size
                                                ; - com.liskov.SingleImplementation::sizePlusOne@4 (line 12)
                                                ;   {runtime_call}
  0x000000010563c3d5: hlt    
  0x000000010563c3d6: hlt    
  0x000000010563c3d7: hlt    
  0x000000010563c3d8: hlt    
  0x000000010563c3d9: hlt    
  0x000000010563c3da: hlt    
  0x000000010563c3db: hlt    
  0x000000010563c3dc: hlt    
  0x000000010563c3dd: hlt    
  0x000000010563c3de: hlt    
  0x000000010563c3df: hlt    
[Exception Handler]
[Stub Code]
  0x000000010563c3e0: jmp    0x000000010562df60  ;   {no_reloc}
[Deopt Handler Code]
  0x000000010563c3e5: call   0x000000010563c3ea
  0x000000010563c3ea: sub    QWORD PTR [rsp],0x5
  0x000000010563c3ef: jmp    0x0000000105608d00  ;   {runtime_call}
  0x000000010563c3f4: hlt    
  0x000000010563c3f5: hlt    
  0x000000010563c3f6: hlt    
  0x000000010563c3f7: hlt    
OopMapSet contains 1 OopMaps

#0 
OopMap{off=80}
After removing all noise, the following Assembly remains:
  0x000000010563c3ac: mov    r11d,DWORD PTR [rsi+0xc]  ;*getfield service
                  ;; load the service field into register r11d
  0x000000010563c3b0: mov    eax,DWORD PTR [r12+r11*8+0xc]
                    ;; load the size field of the service into register eax
  0x000000010563c3b5: inc    eax          
       ;; add one to eax
We can immediately see that the typeguard is gone.

Conclusions

If only a single implementation is loaded and field uses an:
  1. interface based type, the method gets inlined but a typeguard is added.
  2. abstract class based type, the method gets inlined and there is no typeguard.
So in case of an abstract class based field, there is no penalty to pay for applying Liskov Substitution Principle. In case of an interface base field, there is a extra unwanted typeguard.

Note

Please don't go and replace all your interfaces by abstract classes. In most cases code will be slow for other reasons than a simple typeguard. If it gets executed frequently, the branch predictor will make the right prediction. And hopefully in the near future the problems in the class hierarchy analysis get resolved.

Geen opmerkingen:

Een reactie posten

Will temporary assignment to local variable impact performance?

This post is written as a consequence of the following stack overflow question . The question is if temporary assigning the results of some ...