Thursday, August 31, 2006


This is a reminder post for (selfishly) me! Follow along if you like.
(CPAN) Win32::ChangeNotify - does it support (Win32 API) ReadDirectoryChangesW and overlapped IO? If not, how does it avoid missing changes?

(CPAN)Linux::Inotify (and Inotify2) support the Linux kernel API inotify which (perhaps) replaces dnotify. Supported from kernel 2.6.13.
Neither interface is on Fedora core 4, which is 2.6.11.

Further digging required.

Monday, July 24, 2006

Fun with Perl module loading

Q. So exactly when does a BEGIN block get executed?

A. As soon as it is encountered!

For example, take the following code:

use AModule;
use BModule;

print __PACKAGE__." BEGIN block\n";

use AnOther;

$syntax_error = 42

print "Starting main program\n";
print "Ending main program\n";

The order of execution is:
   AModule BEGIN block
AModule main body

BModule BEGIN block
BModule main body

main BEGIN block

AnOther BEGIN block
AnOther main body

syntax error at line 18, near "print"
Execution of aborted due to compilation errors.

So, the main program's BEGIN block is not necessarily the last one executed. Granted, normally it is, since we usually use all our modules before the BEGIN block in main, but we don't have to.
Note also that the blocks are executed even though we have a syntax error, and the program fails to compile.

Q. What use is that?

A. We can alter the way subsequent modules are loaded

Usually that will be by altering @INC:

if ( defined $ENV{TESTLIB} ) {
unshift @INC, $ENV{TESTLIB}

Now all subsequent module loads will search the directory in the environment variable first.

Q. @INC is just a list of directories, Right?

A. Wrong! It can also contain code references.

References to subroutines found by the module loader will be executed as they are found:

print __PACKAGE__." BEGIN block\n";
my $code = sub { print "\@INC code\n" };
unshift @INC,$code;
use AnOther;


main BEGIN block
@INC code
AnOther BEGIN block

Q. Is that it?

A. Of course not.

The subroutine is passed two arguments, the code reference itself and the name of the module it is trying to load. This enables us to track every module loaded. So:

print __PACKAGE__." BEGIN block\n";
my $code = sub {
my (undef, $loading) = @_;
my ($package, $filename, $line) = caller;

print "Loading $loading from $package\n"
unshift @INC,$code;

use AModule;
use BModule;
use AnOther;


main BEGIN block
Loading from main
AModule BEGIN block
AModule main body

Loading from main
BModule BEGIN block
BModule main body

Loading from main
AnOther BEGIN block<>
Of course, further diagnostics could be added, like filename, line, and date/time. I'll leave that to you.

Q. So, @INC is more powerful than its sister %INC, which is just used for diagnostics.

A. Err, no. %INC is used by perl to see if a module is already loaded.

Q. And how is that useful?

A. We can force a module reload by removing its entry from %INC. Consider:
In one part of our code we want to force a different version of a module to be loaded (don't ask why).
  use strict;
use warnings; # DON'T use -w

use A;
use MyB;
use C;

A::mysub(); # Original modules used

delete $INC{''}; # Force perl to reload

unshift @INC,'mydir'; # Change @INC
no warnings 'redefine'; # No 'redefined' warnings
require ''; # Reload module

A::mysub(); # Module in 'mydir' used

Work it out yourself!

Thursday, May 04, 2006

Splitting whitespace

The documentation for Perl is quite good usually, but when it comes to perldoc –f split there is so much magic involved that normal English begins to break down. Take the default arguments, for example.

"If EXPR is omitted, splits the $_ string. If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace)."

Then, later in the documentation:

"As a special case, specifying a PATTERN of space (' ') will split on white space just as "split" with no arguments does. … A "split" with no arguments really does a "split(' ', $_)" internally."

So, how many whitespace characters is that? A single space as a delimiter implies a single space is used for the split, but it actually does 'one or more whitespace':
   $_ = '    This    is    some    text';
@a = split;
$" = '|';
print "@a\n";
So leading whitespace is ignored, and one or more whitespace is used as a delimiter. There is a (documented) subtle difference with \s+:
   $_ = '    This    is    some    text';
@a = split /\s+/;
$" = '|';
print "@a\n";
Notice that the first element of the resulting list is empty, which was not previously the case.

A few questions arise from this. First, what is this ' ' syntax all about? Don't we need a regular expression match?
   $_ = 'xxxThisxxisxxsomexxtext';
@a = split 'x';
$" = '|';
print "@a\n";
So it does work, except not exactly the same as a single space, it does not match 'one or more' (x+), so the space is magic. To be fair the documentation does say that ' ' is a special case. But the documentation does not show the syntax of a string literal, it specifically shows that an RE delimited with / / is required. Single quotes works with regular expressions, and with multiple characters (without a leading 'm'). But double quotes or other characters do not work unless preceded with 'm'.

Second question. What does whitespace mean? Is ' ' the same as \s in this case? Normally, of course, it is not, but in this case it is! ' ' is very special.

Thursday, April 27, 2006

Inside-out accessor methods - version 2

This solution uses a hash of hash-references, which means we no longer need the nasty eval. With thanks to the perl monks.

my (%speed, %reg, %owner, %mileage);
my %hashrefs = ( speed => \%speed,
reg => \%reg,
owner => \%owner,
mileage => \%mileage);

sub set {
my ($self, $attr, $value) = @_;
my $key = refaddr $self;

if ( !exists $hashrefs{$attr} ) {
carp "Invalid attribute name $attr";
else {
$hashrefs{$attr}{$key} = $value;

Wednesday, April 26, 2006

UNIX commands to Perl

This is a rough conversion table from UNIX commands to Perl.
Please note that there are not always direct single equivalents.

UNIX Perl Origin
. do built-in
awk perl ;-) (often 'split') built-in
basename File::Basename::basename Base module
cat while(<>){print} built-in
ExtUtils::Command::cat Base module
cd chdir built-in
chmod chmod built-in
chown chown built-in
cp File::Copy Base module
ExtUtils::Command::cp Base module
date localtime built-in
POSIX::strftime Base module
declare see typedef
df Filesys::Df CPAN
diff File::Compare Base module
dirname File::Basename::dirname Base modules
echo print built-in
egrep while(<>){print if /re/} built-in
eval eval built-in
exec exec built-in
pipe (co-processes) built-in
export Assign to %ENV Hash variable
find File::Find Base module
ftp Net::Ftp Base module
function sub built-in
grep see egrep
integer int built-in
kill kill built-in
ln -s link built-in
ls glob built-in
opendir/readdir/closedir built-in
stat/lstat built-in
mkdir mkdir built-in
mkpath ExtUtils::Command::mkpath Base module
mv rename built-in
ExtUtils::Command::mv Base module
od ord built-in
printf built-in
print print built-in
printf printf built-in
rand rand built-in
rm unlink built-in
ExtUtils::Command::rm Base module
rm –f ExtUtils::Command::rm_rf Base module
sed s/// (usually) built-in
sleep sleep built-in
alarm built-in
sort sort built-in
source do built-in
times times built-in
touch open()/close() built-in
ExtUtils::Command::touch Base module
trap %SIG Hash
sigtrap pragma
typeset my built-in
typeset –I int built-in
typeset –l lc built-in
typeset –u uc built-in
typeset -Z sprintf built-in

Tuesday, April 11, 2006


Walking across London I spotted a poster which screamed:
100 Gig tickets to be won!
"Must be a big stadium", I thought....

Accessor methods - update

I have a much better solution, with the help of the perlmonks. I'll post it when I get chance

Thursday, March 23, 2006

Accessor methods for inside-out objects

I have just been looking at Class::Accessor, recommended by Simon Cozens at It automagically generates accessor methods, but appears to rely on a %fields type approach, and will not work on inside-out objects.
I got to thinking. Actually generalised accessors for inside-out objects are not that difficult, not need some ev[ai]l doings.

my %speed;
my %reg;
my %owner;
my %mileage;

sub set {
my ($self, $attr, $value) = @_;
my $key = refaddr $self;

my $hashref;
eval "\$hashref = \\\%$attr";

if ( !defined $hashref ) {
carp 'Invalid attribute name';

$hashref->{$key} = $value;

sub get {
my ($self, $attr, $value) = @_;
my $key = refaddr $self;

my $hashref;
eval "\$hashref = \\\%$attr";

if ( !defined $hashref ) {
carp 'Invalid attribute name';

return $hashref->{$key};

BTW: I also found how to do formatting: <pre>...</pre>

Wednesday, March 08, 2006

More of the same..

I havn't posted for a while. It's not that nothing has happened, its just that too much has! I'm afraid that the blog suffers when I'm busy. I have been madly updating our 'Advanced Perl with CGI and Web Applications' course, and I am quite pleased with the result. As always I reckon I learnt at least as much as the delegates will (although maybe on more obscure subjects).
For example, did you know you can use __DIE__ and __WARN__ signal handlers to trap Carp::croak and Carp::carp calls?

I returned home from teaching in the USA last Saturday to my latest multiple order from Amazon:
Object Oriented Perl by Damian Conway. A little out of date (2000) but good nontheless, written in typical Damian style. Hacking the Advanced Perl course (two chapters on OO) and reading this book has totally changed my views on Perl as an OO language. The book does not contain anything on inside-out objects, but you can see that Damian was almost there. We now have a chapter on this simple but effective means of encapsulation in the Advanced course.

Number two book is the "Pickaxe book", Programming Ruby. About time I read it, more on that later (hopefully).

Number three is "PHP and MySQL Web Development". A huge tome which (unlike OOP and Ruby) does not look like 'fun'.
Last but not least another O'Reilly pocket book, this time the Linux one. All these pocket books, I need huge pockets. Not to mention another bookshelf....

Monday, January 16, 2006

What a refreshing change

I'm on the West coast of Ireland, Galway, teaching. I don't think anyone will be offended if I say that Galway is a little remote.
The air and the Guiness are all refreshing, but what stopped me in my tracks is the co-operation between businesses. One major international company booked me to give a course, but then said to others in Galway, "Hey, we have a technical course running this week, anyone else interested?" , and so along comes a few guys from other companies that could not justify the course on their own. Apparently this happens all the time.
These companies are major internationals acting like sensibile human beings. Now why can't that happen everywhere?

Wednesday, January 04, 2006

IFS and read

I find myself continually underestimating the Korn shell. I always thought that Bash was neat because we could read into an array with read –a, but of course we can do the same in ksh with read –A. The effect is to split up the input into fields around $IFS. But did you know we can set IFS just for a read statement? Look carefully:

while IFS=':' read -A line
end=$(( ${#line[*]} - 1 ))
if [[ ${line[$end]} != /sbin/nologin ]]
echo "${line[*]}"
done < /etc/passwd

(display all the lines in /etc/passwd whose last field is not /sbin/nologin, changing the field delimiter from colon to comma).

The first IFS=',' sets IFS for the echo expansion and is not overridden by the IFS on the read line. That IFS only applies for the read, nothing else.

By the way, did you know that the expansion (including $*) uses the first character from IFS, so order matters? So IFS=',.:' gives different results to IFS=':,.'.

All this applies to the standard POSIX shell, as well as Korn shell and Bash, and I was alerted to these features by Classic Shell Scripting (see a previous post).

Finally, a significant bug in pdksh has been fixed in the version with Fedora core 4 (1993-12-28 q). In early versions a pipe generated a child process on each side, so piping into a while loop was not useful, since the while loop ran in a different process to the rest of the script, and so variables could not be saved. Now it all works, consider:


ps -ef | while read uid pid ppid c stime tty time cmd
if [[ $cmd == *xinetd* ]]

echo "Pid of xinetd is: $pid"

Unfortunately this will still not work in Bash, for the same reason as it did not work in the old pdksh. In Bash we can use process substitution instead, and while we are about it we may as well use an ERE (because we can):

while read uid pid ppid c stime tty time cmd
if [[ $cmd =~ '^xinetd +' ]]
done < <(ps -ef)

echo "Pid of xinetd is: $pid"

The man pages for the new pdksh say process substitution works in Korn shell as well, and I am told this works on Solaris, but it does not appear to work on Linux.