PHP 7 magic function call trampoline

Introduction

This article will detail an optimization that's been added to PHP 7 virtual machine executor (Zend VM). We'll get back a minute into theorical concepts about function call trampolines, then we'll detail how those work into PHP 7.
It is better - if you want to fully understand - to have a nice overview on how the Zend VM works. I suggest you follow this article which details the internals of PHP 5 VM. Here, we'll talk about PHP 7. Although PHP 7 VM has been reworked, it barely works the same way as PHP 5's ; so understanding PHP 5's VM is a huge step forward to understand PHP 7's one.

It's all about recursion

If you've never heard about a trampoline, chances are you've never touched languages such as Haskell or Scala. Even not touching such languages, function call trampolines is a programmer trick usually learnt in deep-level programming courses. The goal of a trampoline is to prevent recursion in function calls. This is the theory mainline; there exists many ways of implementing such a mechanism in programs.

You know about recursion. If you don't, I suggest you start by learning what recursion is... Easy one ;-).
So, let's start by a quick and simple example :

	function factorial($n)
	{
		if ($n == 1) {
			return $n;
		}
		return $n * factorial($n-1);
	}

Yeah, everyone knows about this function. The factorial function is the easiest one to understand recursion. It got very few instructions, and above all : it calls itself.

You know that every time a function is called, many things happen. At the low level, the compiler must use a function calling convention to prepare such a call. Basically, it will push on the stack the arguments and the return address; then issue a CALL Opcode, branching the CPU into the first instruction of the function body. When this latter terminates, a RETURN is used (or generated by the compiler if none was found into the function body), telling the CPU to get rid of the arguments stack (resets the stack pointer) and get back to return address. Nice.

Low level recursion is not very disturbing, because at low level the CPU is pretty fast and the code well optimized. Under X86_64 Linux, the stack is nearly not used anymore, as many parameters are passed using CPU registers directly. Also, the stack is a contiguous piece of memory which will be prefetched and loaded into CPU caches, and thus accessing the stack space is very very fast.

However, when it comes to play with high level languages like PHP, things are pretty different. We use a virtual machine in high level languages, that means that we have to write ourselves a stack, a stack management etc... we have to virtualize those concepts as obviously high level languages are not low level languages, and thus can't benefit from the low level speed.

In a word, recursion in not that of a problem in low level (C or assembly), but becomes one in high level virtual-machine-based languages like PHP, where we would thus like to avoid it as much as possible.

Preventing recursion

There exists several ways to prevent recursion in function calls. We'll use the PHP language to study simple ones and a more generic one using what's called a trampoline function. We'll then get back to PHP sources to explain how those concepts have been added into PHP 7 heart.

Tail-call functions and loops

A recursion is a kind of loop : 'as long as XXX, call myself'. That's a sort of loop, and thus a recursive function can be rewritten using a loop (sometimes more than one), without calling itself anywhere.
Be warned however that such an exercise is not easy, depending on your function, how many times it calls back itself, and how it does so.

Fortunately, the factorial function is really easy to "un-recurse". There exists theoretical ways to do that, we'll use the tail-call tranformation to achieve that.
The factorial function can be unrolled by turning it into a tail-call recurse function, and applying some theorem. Let's turn it into a tail-call function first :

	function tail_factorial($n, $acc = 1)
	{
		if ($n == 1) {
			return $acc;
		}
		return tail_factorial($n-1, $n * $acc);
	}

We use what's called an accumulator to perform that. The goal is to end having a tail-call function.
As a reminder, a tail-call function is a function in where it comes to return itself, does return itself alone, with no other operation. That is the return statement is passed only the recursion call with no other operation at all. It must be single-instruction reentrant. That way a compiler can optimize the last call, because the function returns the return of itself, hence it can simplify the stack creation by simply reusing the current stack frame for the last call instead of creating a new one.

Tail Call optimization is a very old compiler optimization trick, used in many compilers. If you are interested in the subject, I suggest books, like Advanced Compiler Design and Implementation for example.

Also, we may now transform this tail-call function whose body simply represents a loop. What we need to do instead of calling back ourselves with changed arguments, is to jump back on the top of the function (like a recurse call would have done), but in between change the arguments, so that the next loop will run with the right argument values (what a recursive function does). That gives :

	function unrolled_factorial($n)
	{
		$acc = 1;
		while ($n > 1) {
			$acc *= $n--;
		}
		return $acc;
	}

This function does the exact same as the original factorial() function, but it does not call itself anymore. It is way more performant that the recursive alternative at runtime.

We also could have used a goto branch :

	function goto_factorial($n)
	{
		$acc = 1;
	f:
		if ($n == 1) {
			return $acc;
		}
		$acc *= $n--;
		goto f;
	}

This is exactly the same : no more recursion.

Try it. Try to run factorial() with a huge number : you'll run out of stack space and hit a memory limit of the engine (as the stack frames into the VM are allocated on the heap). If you disable the limit (memory_limit), then PHP will crash because you may know that PHP and the Zend VM have absolutely no guard against infinite recursion, hence the process crashes.
Now with the same argument try to run the unrolled one, unrolled_factorial() or even the goto_factorial(). It will not crash. Eventually it could take time to run, but won't crash, won't exhaust the stack space (heap allocated in PHP) and will run way faster than the recursive one.

Trampoline controlling tail-call functions

However, it happens that sometimes it is not that easy to unrecurse a function. Factorial is the simplest example, but some other are not.
Think about a function calling itself at different places, with different conditions, etc... (like a simple bsearch() implementation f.e).

In such cases, we may need to use what's called a function trampoline to master the recursion. The base recursive function will need to be rewritten (like when we un-recurse it), but it can keep the calls to itself this time. Simply, we will decorate such calls, and call base recursive function using a trampoline instead of invocating it directly.
That way, the recursion will be unrolled by having a control flow (a trampoline) mastering each call of our subject. There is no more need to deeply think about how to un-recurse such a complex function, simply wrap it, and don't launch it by itself but instead, launch it through some control code, called a trampoline.

Let's see an example of that using the PHP language. The idea is to transform our function, so that its caller can detect when it recurses, or when it leaves.
If it performs a recursive call to itself, the trampoline will control its stack by becoming its callee. If it returns its result, the trampoline should notice it and stop.

Like this :

	function trampo_factorial($n, $acc = 1)
	{
		if ($n == 1) {
			return $acc;
		}
		return function() use ($n, $acc) { return trampo_factorial($n-1, $n * $acc); };
	}

You see here that this function still calls itself. However, it doesn't directly call itself, but it wraps its recursive call into a closure.
This is because now we won't launch our recursive function directly, but using a trampoline. When the trampoline sees a closure returned : it launches it, when it sees a non-closure, it returns; hence the "trampoline" wording.

	function trampoline(callable $c, ...$args)
	{
		while (is_callable($c)) { $c = $c(...$args); } return $c;
	}

And we are done.
Use it like this :

	echo trampoline('trampo_factorial', 42);

The trampoline is a generic solution to recursion. If you can't refactor your function to eliminate recursive calls, then tranform it to a tail-call function that can be launched through a trampoline. Of course, trampoline only works with tail-call functions, how could it be otherwise ?

With a trampoline in action, this one will take some callables and launch them as often as needed, preventing those callables from calling themselves recursively, the trampoline being the callee.
Here again, we solved the recursion problem, in a much more generic way that can be applied to every recursive function.

Be warned however that I used the PHP language here for you to understand the concepts (as I guess you use PHP often if you read those lines ?). But, I would not recommand creating trampolines in PHP, PHP is a high level language and such constructs shouldn't be needed to PHP developpers in their daily jobs. You may not need that much of recursive functions using PHP, and having a loop with an is_callable() call into it is not that light.

However, lets now dive into the PHP engine and see how trampolines have been implemented to prevent stack recursion into the main PHP VM dispatch loop.

Recursion in the Zend VM Executor

So, let's go !
I hope you remember about the dispatch loop of the executor right ?

Let me refresh your mind.
Any virtual machine is built on the same big principles, among what the "dispatch loop".
An infinite loop is run, and at every step, one single VM instruction (opline) is run (handler()). Into this instruction, many things can happen, but at the end of each instruction is a command to the loop, usually "goto next instruction". It could also be "return from the infinite loop", or "jump to this operation".

The engine VM default dispatch loop is stored into the execute_ex() function.
Here it is, as a reminder, but for PHP 7, and with the optimizations related to my machine (IP and FP registers used) :

	#define ZEND_VM_FP_GLOBAL_REG "%r14"
	#define ZEND_VM_IP_GLOBAL_REG "%r15"
	
	register zend_execute_data* volatile execute_data __asm__(ZEND_VM_FP_GLOBAL_REG);
	register const zend_op* volatile opline __asm__(ZEND_VM_IP_GLOBAL_REG);
	
	ZEND_API void execute_ex(zend_execute_data *ex)
	{
		const zend_op *orig_opline           = opline;
		zend_execute_data *orig_execute_data = execute_data;
		execute_data                         = ex;
		opline                               = execute_data->opline;
		while (1) {
			opline->handler();
			if (UNEXPECTED(!opline)) {
				execute_data = orig_execute_data;
				opline       = orig_opline;
				return;
			}
		}
		zend_error_noreturn(E_CORE_ERROR, "Arrived at end of main loop which shouldn't happen");
	}

So you can notice the while(1) structure. What about recursion ? What's the matter ?

Well it is easy to spot. You run the while(1) loop, as part of the execute_ex() function. What happens if one instruction in there (opline->handler()) runs itself execute_ex() ? Well we are in a recursion case. Is it bad ? Well as usual : yes, if many levels of recursion happen to stack up.

In which case execute_ex() calls execute_ex() ? I can't go too deep into the VM engine here because you may miss many important informations to fully understand, but we could think that a PHP function call, calls for execute_ex(). It is the case.

Any time you perform a PHP function call, it creates a new stack frame at the C level, and runs a new version of the dispatch loop by re-entring into a new execute_ex() call, with new instructions to be run. When that loop exits, that is when the PHP function call ends, it leads to the "return" case in the code, and thus it ends the current loop of the current frame onto the stack, resuming the preceding one.
Take care however, this happens only for userland PHP functions. Because user-defined PHP functions are OPCodes to be run into the loop when launched. But, internal PHP functions (PHP functions designed in C, into the core or into PHP extensions) do not need to run opcodes, they are plain C instructions, thus they don't create another dispatch loop and another frame.

The __call() use case

Now I will explain you the __call() use case. What is __call() ? A userland PHP function. So ? So when it is run, it will lead to a new execute_ex() call, like any userland PHP function.
The specific problem of __call() is that is may be called many times, stacking up a lot of frames into the engine. Any time an unknown method is called on an object with an __call() defined in its class.

As of PHP 7, the engine has been optimized with the addition of a trampoline mastering __call() calls, and preventing recursive calls to execute_ex() in the __call() use-case.

This is __call()'s stack in PHP 5.6 :

56-call-stack

If you count the number of execute_ex() calls, you will see 3 of them. It is taken from a PHP script which invokes an unknown method on an object itself calling an unknown method on another object (whose classes obviously both contain __call()s. So, the first execute_ex() is the main script execution (position 6 in the call stack), and then on top of it we can count 2 other calls to execute_ex().

Now, the same script run under PHP 7 :

70-call-stack

The difference is obvious : the stack frame is much more thin, and there is only one call to execute_ex(), that is there is only one dispatch loop, dispatching every instruction including __call() calls.

Turning __call() calls to trampoline calls

In PHP 5, we used to call execute_ex() in __call() context. That is, we prepare a new dispatch loop to run the currently asked __call() OPCodes. But the method run, was the method called, for example fooBarDontExist(), so we had to fake its name as well. We had to allocate some structures, and perform a classic userland function call. Something like that (simplified) :

	ZEND_API void zend_std_call_user_call(INTERNAL_FUNCTION_PARAMETERS)
	{
		zend_internal_function *func = (zend_internal_function *)EG(current_execute_data)->function_state.function;
		zval *method_name_ptr, *method_args_ptr;
		zval *method_result_ptr = NULL;
		zend_class_entry *ce = Z_OBJCE_P(this_ptr);

		ALLOC_ZVAL(method_args_ptr);
		INIT_PZVAL(method_args_ptr);
		array_init_size(method_args_ptr, ZEND_NUM_ARGS());
		/* ... ... */
		ALLOC_ZVAL(method_name_ptr);
		INIT_PZVAL(method_name_ptr);
		ZVAL_STRING(method_name_ptr, func->function_name, 0); /* Copy function name */

		/* Perform a new dispatch loop call : this will call execute_ex() */
		zend_call_method_with_2_params(&this_ptr, ce, &ce->__call, ZEND_CALL_FUNC_NAME, &method_result_ptr, method_name_ptr, method_args_ptr);

		if (method_result_ptr) {
			RETVAL_ZVAL_FAST(method_result_ptr);
			zval_ptr_dtor(&method_result_ptr);
		}

		zval_ptr_dtor(&method_args_ptr);
		zval_ptr_dtor(&method_name_ptr);

		efree(func);
	}

This used to be a lot of work to perform such calls. That is why we used to hear "try to avoid using __call() for performance reasons" (among other reasons). This wasn't false.

Now in PHP 7. Remember the trampoline theory ? It will be more or less the same here.
What we want to avoid is recursively calling execute_ex(). To avoid that, we must un-recurse that, that is we must stay in the same execute_ex() context, and rebranch from it to its top, changing the arguments needed. Let's see that execute_ex() again :

	ZEND_API void execute_ex(zend_execute_data *ex)
	{
		const zend_op *orig_opline           = opline;
		zend_execute_data *orig_execute_data = execute_data;
		execute_data                         = ex;
		opline                               = execute_data->opline;
		while (1) {
			opline->handler();
			if (UNEXPECTED(!opline)) {
				execute_data = orig_execute_data;
				opline       = orig_opline;
				return;
			}
		}
		zend_error_noreturn(E_CORE_ERROR, "Arrived at end of main loop which shouldn't happen");
	}

So, the variables we need to change to prevent recursion calls are at least opline, and also execute_data (which contains the next OPCodes, opline is the "current" OPCode to run).
Hence, once __call() is met, we change opline and execute_data, then we return, we return back to the current dispatch loop, make it continue to our freshly changed new OPCodes, and at the end make it return to where it was (this is why we have also orig_opline and orig_execute_data ; a virtual machine dispatcher should always remember where it comes from so that it can branch back to here from anywhere).

Well, this is exactly what the new ZEND_CALL_TRAMPOLINE OPCode does. This new PHP 7 OPCode is used anywhere __call() calls must be performed.
Let's have a look at a simplified version :

	#define ZEND_VM_ENTER() execute_data = (executor_globals.current_execute_data); opline = ((execute_data)->opline); return
	static ZEND_OPCODE_HANDLER_RET ZEND_FASTCALL ZEND_CALL_TRAMPOLINE_SPEC_HANDLER(ZEND_OPCODE_HANDLER_ARGS)
	{
		zend_array *args;
		zend_function *fbc = EX(func);
		zval *ret = EX(return_value);
		uint32_t call_info = EX_CALL_INFO() & (ZEND_CALL_NESTED | ZEND_CALL_TOP | ZEND_CALL_RELEASE_THIS);
		uint32_t num_args = EX_NUM_ARGS();
		zend_execute_data *call;

		/* ... */

		SAVE_OPLINE();
		call = execute_data;
		execute_data = EG(current_execute_data) = EX(prev_execute_data);

		/* ... */

		if (EXPECTED(fbc->type == ZEND_USER_FUNCTION)) {
			call->symbol_table = NULL;
			i_init_func_execute_data(call, &fbc->op_array,
					ret, (fbc->common.fn_flags & ZEND_ACC_STATIC) == 0);
			if (EXPECTED(zend_execute_ex == execute_ex)) {
				ZEND_VM_ENTER();
			}
		/* ... */

We can spot here that the execute_data and opline variables are effectively changed, by the ZEND_VM_ENTER() macro.
The next execute_data are prepared in the call variable, and the function i_init_func_execute_data() bind them. Then, a new turn of the current dispatch loop is performed with ZEND_VM_ENTER() which switches the variables for the next loop, and instruct to enter into it with a "return" (of the current loop).

The circle is complete, we are done.

How to reach back the main loop then ? Well this is done in the ZEND_RETURN OPCode which ends every user-defined function call whatever it is.

	#define LOAD_NEXT_OPLINE() opline = ((execute_data)->opline) + 1
	#define ZEND_VM_LEAVE() return
	
	static ZEND_OPCODE_HANDLER_RET ZEND_FASTCALL zend_leave_helper_SPEC(ZEND_OPCODE_HANDLER_ARGS)
	{
		zend_execute_data *old_execute_data;
		uint32_t call_info = EX_CALL_INFO();

		if (EXPECTED(ZEND_CALL_KIND_EX(call_info) == ZEND_CALL_NESTED_FUNCTION)) {
			zend_object *object;

			i_free_compiled_variables(execute_data);
			if (UNEXPECTED(EX(symbol_table) != NULL)) {
				zend_clean_and_cache_symbol_table(EX(symbol_table));
			}
			zend_vm_stack_free_extra_args_ex(call_info, execute_data);

			old_execute_data = execute_data;
			execute_data = EG(current_execute_data) = EX(prev_execute_data);
			
			/* ... */

			LOAD_NEXT_OPLINE();
			ZEND_VM_LEAVE();
		}
		/* ... */

Like we can see, on returning from a user-defined function call, we hit a ZEND_RETURN which replaces the next-to-be-run instructions by the previous ones, from the previous call : prev_execute_data. It then loads the opline, and returns to the main dispatch loop.

Conclusion

We've detailed the theory which resides behind the unrolling of recursive function calls. Any recursive call can be flatten, but it may happen to be very hard to do. A more generic solution is to design a trampoline : a system that takes care of launching every step of a recursive function, preventing that letter from calling itself and thus blowing up the stack frame.
Some code, known as the trampoline code, sits in place of the dispatcher and plays with it to prevent recursion to happen.

We've seen a generic implementation in PHP, and also how trampolines have been implemented into the new Zend Engine 3, part of PHP 7.
Do no fear __call() calls anymore in PHP 7, they are faster than PHP 5's, they don't create a new stack frame (at the C level) on every call, and are part of many improvements added to the PHP 7 engine.