Monday, July 24, 2006

Fun with Perl module loading

Q. So exactly when does a BEGIN block get executed?

A. As soon as it is encountered!

For example, take the following code:

use AModule;
use BModule;

BEGIN {
print __PACKAGE__." BEGIN block\n";
}

use AnOther;

$syntax_error = 42

print "Starting main program\n";
print "Ending main program\n";

The order of execution is:
   AModule BEGIN block
AModule main body

BModule BEGIN block
BModule main body

main BEGIN block

AnOther BEGIN block
AnOther main body

syntax error at main.pl line 18, near "print"
Execution of main.pl aborted due to compilation errors.

So, the main program's BEGIN block is not necessarily the last one executed. Granted, normally it is, since we usually use all our modules before the BEGIN block in main, but we don't have to.
Note also that the blocks are executed even though we have a syntax error, and the program fails to compile.


Q. What use is that?

A. We can alter the way subsequent modules are loaded

Usually that will be by altering @INC:

BEGIN {
if ( defined $ENV{TESTLIB} ) {
unshift @INC, $ENV{TESTLIB}
}
}


Now all subsequent module loads will search the directory in the environment variable first.

Q. @INC is just a list of directories, Right?

A. Wrong! It can also contain code references.

References to subroutines found by the module loader will be executed as they are found:


BEGIN {
print __PACKAGE__." BEGIN block\n";
my $code = sub { print "\@INC code\n" };
unshift @INC,$code;
}
use AnOther;

Gives:

main BEGIN block
@INC code
AnOther BEGIN block

Q. Is that it?

A. Of course not.

The subroutine is passed two arguments, the code reference itself and the name of the module it is trying to load. This enables us to track every module loaded. So:

BEGIN {
print __PACKAGE__." BEGIN block\n";
my $code = sub {
my (undef, $loading) = @_;
my ($package, $filename, $line) = caller;

print "Loading $loading from $package\n"
};
unshift @INC,$code;
}

use AModule;
use BModule;
use AnOther;

Gives:

main BEGIN block
Loading AModule.pm from main
AModule BEGIN block
AModule main body

Loading BModule.pm from main
BModule BEGIN block
BModule main body

Loading AnOther.pm from main
AnOther BEGIN block<>
Of course, further diagnostics could be added, like filename, line, and date/time. I'll leave that to you.

Q. So, @INC is more powerful than its sister %INC, which is just used for diagnostics.

A. Err, no. %INC is used by perl to see if a module is already loaded.

Q. And how is that useful?

A. We can force a module reload by removing its entry from %INC. Consider:
In one part of our code we want to force a different version of a module to be loaded (don't ask why).
  use strict;
use warnings; # DON'T use -w

use A;
use MyB;
use C;

A::mysub(); # Original modules used

delete $INC{'A.pm'}; # Force perl to reload

unshift @INC,'mydir'; # Change @INC
{
no warnings 'redefine'; # No 'redefined' warnings
require 'A.pm'; # Reload module
}

A::mysub(); # Module in 'mydir' used

Work it out yourself!