Downcasting Longs To Ints On x86
Last week, my esteemed colleague and close friend asked a remarkably straight-forward question about downcasting a long
to an int
in Java. I'll admit the question caught me off guard. While the JLS offered the correct answer, I couldn't help but ponder what's actually happening in the machine.
In this article I'm going to try to explain what actually happens (on x86) when you downcast (or narrow) a 64-bit long
to a 32-bit int
. I will work my way down from Java bytecode, through the JVM (focusing on HotSpot), down to the CPU. The answer is pretty simple (hint: not much), but getting to the answer is certainly an interesting lesson. As always, should this article contain any mistakes or misinformation, I would appreciate a heads-up.
Bytecode
The Java bytecode responsible for downcasting a long
to an int
is l2i
(long to int). The bytecode expects a long
to be on the top of the operand stack (in JVM lingo, this precondition known as ltos
, or the top of the stack is a long) and will finish with an int
on the top of the operand stack (itos
). As a demonstration, the following code will generate a fairly straight-forward example:
public static void main(final String[] args)
{
final long g = Long.parseLong(args[0]);
final int i = (int)g;
System.out.println(i);
}
This will produce the following bytecode:
public static void main(java.lang.String[]);
Code:
stack=2, locals=4, args_size=1
0: aload_0
1: iconst_0
2: aaload
3: invokestatic #2 // Method java/lang/Long.parseLong:(Ljava/lang/String;)J
6: lstore_1
7: lload_1
8: l2i
9: istore_3
10: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;
13: iload_3
14: invokevirtual #4 // Method java/io/PrintStream.println:(I)V
17: return
Notice at bytecode index (BCI) 8 the aforementioned l2i
operator. As expected, BCI 7 is loading a long
onto the top of the stack with the lload
and, as evident by the istore
on BCI 9, is replacing it with an int
. At this point, we can step back and let the JVM do its thing.
Worth mentioning is that the Java compiler is smart enough to optimize away unnecessary casts. For example, if the source long
was in the constant pool, the compiler can simply calculate the resultant int and store it in the constant pool, too. This can be demonstrated with the following example:
public static void main(final String[] args)
{
final long l = 50000000000L;
final int i = (int)l;
System.out.println(i);
}
Which produces the much more compact bytecode:
public static void main(java.lang.String[]);
Code:
stack=2, locals=4, args_size=1
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // int -1539607552
5: invokevirtual #4 // Method java/io/PrintStream.println:(I)V
8: return
JVM
On x86, the JVM doesn't actually have to do much (which makes this entire topic interesting). Unfortunately, there's a lot of complexity behind the little that has to be done. Because this topic is extremely complex and is an area of which I am still somewhat unfamiliar (and the focus of another article), I will try to keep the explanation at a high level.
In a nutshell, the JVM is an interpreter. There are different flavours of interpreters, but the one most commonly used (i.e. the ubiquitous "interpreter") is the Template Interpreter. Each time a JVM starts-up, this interpreter is generated and compiled (yes, at runtime). Essentially, the interpreter consists of several "templates" for each and every Java bytecode. The template is created by emitting and then compiling ASM. Accordingly, there is a template generated for the l2i
bytecode.
When a template is invoked by the interpreter, the TOS element may be passed in different ways (in fact, each template may have different types of entry points allowing for a type of dynamic invocation). In some instances, it's passed in a register and other times it's passed on the native stack.
CPU
It's hard to generalize what the l2i
template looks like since each processor (and operating system) will have a slightly different version (remember, the template is generated at runtime). Because we're focusing on x86, there are two differing areas of focus: 32-bit and 64-bit.
x86_32
For the l2i
template, the ltos
is passed on the native stack. To continue, the ltos
must be popped off the stack. Because x86's general purpose registers are 32-bits wide, the l2i
template must use two 32-bit registers to hold the ltos
. In this case, EAX
and EDX
: EAX
holds the low-order bits and EDX
holds the high order bits.
This template stores the resultant itos
in the EAX
register. To perform the actual cast, the template must only pop the ltos
from the stack into two registers:
0xf36d4160: pop %eax
0xf36d4161: pop %edx
Now, EAX
holds the low 32 bits, which represent the downcast int
. That's it.
x86_64
The x86_64 template is similarly straight-forward, but, because the general-purpose registers are already 64-bits wide, the ltos
can be passed in a register (RAX
), thus preventing a trip to the stack. Similar to x86_32, the resultant itos
is stored in a register, RAX
.
0x00007f9f48f01fe8: mov %eax,%eax
If you're unfamiliar with assembly, the instruction is moving the value in EAX
back into EAX
. At this point, you may have two questions:
- Why bother moving the register into itself?
- What's up with
EAX
? Aren't theltos
anditos
supposed to be inRAX
.
In x86_64, the E**
registers are aliases to the low 32 bits of the corresponding R**
register. For example, EAX
will provide the low-order 32 bits of the value in RAX
. Similarly, when storing to E**
, only the low-order 32 bits are written (the high-order bits are "zeroed"). Given this information, reading the low-order bits of RAX
(through EAX
) and storing them into RAX
(through EAX
) effectively performs the downcast. RAX
now contains the itos
.
Considerations
This article probably isn't too useful unless you find yourself on the same journey as me: spelunking through the OpenJDK HotSpot JVM trying to become a better developer. In which case, I hope this article is useful and informative.