The Darke Side: 2006

Thursday, August 31, 2006

Notifications

This is a reminder post for (selfishly) me! Follow along if you like.
(CPAN) Win32::ChangeNotify - does it support (Win32 API) ReadDirectoryChangesW and overlapped IO? If not, how does it avoid missing changes?

(CPAN)Linux::Inotify (and Inotify2) support the Linux kernel API inotify which (perhaps) replaces dnotify. Supported from kernel 2.6.13.
Neither interface is on Fedora core 4, which is 2.6.11.

Further digging required.

Monday, July 24, 2006

Fun with Perl module loading

Q. So exactly when does a BEGIN block get executed?

A. As soon as it is encountered!

For example, take the following code:


  use AModule;
  use BModule;

  BEGIN {
     print __PACKAGE__." BEGIN block\n";
  }

  use AnOther;

  $syntax_error = 42

  print "Starting main program\n";
  print "Ending main program\n";

The order of execution is:

   AModule BEGIN block
   AModule main body

   BModule BEGIN block
   BModule main body

   main BEGIN block

   AnOther BEGIN block
   AnOther main body

   syntax error at main.pl line 18, near "print"
   Execution of main.pl aborted due to compilation errors.

So, the main program's BEGIN block is not necessarily the last one executed. Granted, normally it is, since we usually use all our modules before the BEGIN block in main, but we don't have to.
Note also that the blocks are executed even though we have a syntax error, and the program fails to compile.

Q. What use is that?

A. We can alter the way subsequent modules are loaded

Usually that will be by altering @INC:


  BEGIN {
     if ( defined $ENV{TESTLIB} ) {
         unshift @INC, $ENV{TESTLIB}
     }
  }

Now all subsequent module loads will search the directory in the environment variable first.

Q. @INC is just a list of directories, Right?

A. Wrong! It can also contain code references.

References to subroutines found by the module loader will be executed as they are found:


  BEGIN {
     print __PACKAGE__." BEGIN block\n";
     my $code = sub { print "\@INC code\n" };
     unshift @INC,$code;
  }
  use AnOther;

Gives:


  main BEGIN block
  @INC code
  AnOther BEGIN block

Q. Is that it?

A. Of course not.

The subroutine is passed two arguments, the code reference itself and the name of the module it is trying to load. This enables us to track every module loaded. So:


  BEGIN {
     print __PACKAGE__." BEGIN block\n";
     my $code = sub {
        my (undef, $loading) = @_;
        my ($package, $filename, $line) = caller;

        print "Loading $loading from $package\n"
     };
     unshift @INC,$code;
  }

  use AModule;
  use BModule;
  use AnOther;

Gives:

main BEGIN block
Loading AModule.pm from main
AModule BEGIN block
AModule main body

Loading BModule.pm from main
BModule BEGIN block
BModule main body

Loading AnOther.pm from main
AnOther BEGIN block<>
Of course, further diagnostics could be added, like filename, line, and date/time. I'll leave that to you.

Q. So, @INC is more powerful than its sister %INC, which is just used for diagnostics.

A. Err, no. %INC is used by perl to see if a module is already loaded.

Q. And how is that useful?

A. We can force a module reload by removing its entry from %INC. Consider:
In one part of our code we want to force a different version of a module to be loaded (don't ask why).

  use strict;
 use warnings;                # DON'T use -w

 use A;
 use MyB;
 use C;

 A::mysub();                  # Original modules used

 delete $INC{'A.pm'};         # Force perl to reload

 unshift @INC,'mydir';        # Change @INC
 {
    no warnings 'redefine';   # No 'redefined' warnings
    require 'A.pm';           # Reload module
 }

 A::mysub();                  # Module in 'mydir' used

Work it out yourself!

Thursday, May 04, 2006

Splitting whitespace

The documentation for Perl is quite good usually, but when it comes to perldoc –f split there is so much magic involved that normal English begins to break down. Take the default arguments, for example.

"If EXPR is omitted, splits the $_ string. If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace)."

Then, later in the documentation:

"As a special case, specifying a PATTERN of space (' ') will split on white space just as "split" with no arguments does. … A "split" with no arguments really does a "split(' ', $_)" internally."

So, how many whitespace characters is that? A single space as a delimiter implies a single space is used for the split, but it actually does 'one or more whitespace':

   $_ = '    This    is    some    text';
   @a = split;
   $" = '|';
   print "@a\n";

Produces

   This|is|some|text

So leading whitespace is ignored, and one or more whitespace is used as a delimiter. There is a (documented) subtle difference with \s+:

   $_ = '    This    is    some    text';
   @a = split /\s+/;
   $" = '|';
   print "@a\n";

Produces:

   |This|is|some|text

Notice that the first element of the resulting list is empty, which was not previously the case.

A few questions arise from this. First, what is this ' ' syntax all about? Don't we need a regular expression match?

   $_ = 'xxxThisxxisxxsomexxtext';
   @a = split 'x';
   $" = '|';
   print "@a\n";

Produces:

    |||This||is||some||te|t

So it does work, except not exactly the same as a single space, it does not match 'one or more' (x+), so the space is magic. To be fair the documentation does say that ' ' is a special case. But the documentation does not show the syntax of a string literal, it specifically shows that an RE delimited with / / is required. Single quotes works with regular expressions, and with multiple characters (without a leading 'm'). But double quotes or other characters do not work unless preceded with 'm'.

Second question. What does whitespace mean? Is ' ' the same as \s in this case? Normally, of course, it is not, but in this case it is! ' ' is very special.

Thursday, April 27, 2006

Inside-out accessor methods - version 2

This solution uses a hash of hash-references, which means we no longer need the nasty eval. With thanks to the perl monks.


my (%speed, %reg, %owner, %mileage);
my %hashrefs = ( speed   => \%speed,
                 reg     => \%reg,
                 owner   => \%owner,
                 mileage => \%mileage);
                   
sub set {
   my ($self, $attr, $value) = @_;
   my $key = refaddr $self;
     
   if ( !exists $hashrefs{$attr} ) {
      carp "Invalid attribute name $attr";
   }
   else {
      $hashrefs{$attr}{$key} = $value;
   }
}

Wednesday, April 26, 2006

UNIX commands to Perl

This is a rough conversion table from UNIX commands to Perl.
Please note that there are not always direct single equivalents.


UNIX         Perl                         Origin
.            do                           built-in
awk          perl ;-) (often 'split')     built-in
basename     File::Basename::basename     Base module
cat          while(<>){print}             built-in
             ExtUtils::Command::cat       Base module
cd           chdir                        built-in
chmod        chmod                        built-in
chown        chown                        built-in
cp           File::Copy                   Base module
             ExtUtils::Command::cp        Base module
date         localtime                    built-in
             POSIX::strftime              Base module
declare      see typedef 
df           Filesys::Df                  CPAN
diff         File::Compare                Base module
dirname      File::Basename::dirname      Base modules
echo         print                        built-in
egrep        while(<>){print if /re/}     built-in
eval         eval                         built-in
exec         exec                         built-in
             pipe (co-processes)          built-in
export       Assign to %ENV               Hash variable
             Env::C                       CPAN
find         File::Find                   Base module
ftp          Net::Ftp                     Base module
function     sub                          built-in
grep         see egrep    
integer      int                          built-in
kill         kill                         built-in
ln -s        link                         built-in
ls           glob                         built-in
             opendir/readdir/closedir     built-in
             stat/lstat                   built-in
mkdir        mkdir                        built-in
mkpath       ExtUtils::Command::mkpath    Base module
mv           rename                       built-in
             ExtUtils::Command::mv        Base module
od           ord                          built-in
             printf                       built-in
print        print                        built-in
printf       printf                       built-in
rand         rand                         built-in
rm           unlink                       built-in
             ExtUtils::Command::rm        Base module
rm –f        ExtUtils::Command::rm_rf     Base module
sed          s/// (usually)               built-in
sleep        sleep                        built-in
             alarm                        built-in
sort         sort                         built-in
source       do                           built-in
times        times                        built-in
touch        open()/close()               built-in
             ExtUtils::Command::touch     Base module
trap         %SIG                         Hash
             sigtrap                      pragma
typeset      my                           built-in
typeset –I   int                          built-in
typeset –l   lc                           built-in
typeset –u   uc                           built-in
typeset -Z   sprintf                      built-in

Tuesday, April 11, 2006

Sad....

Walking across London I spotted a poster which screamed:
100 Gig tickets to be won!
"Must be a big stadium", I thought....

Accessor methods - update

I have a much better solution, with the help of the perlmonks. I'll post it when I get chance

Thursday, March 23, 2006

Accessor methods for inside-out objects

I have just been looking at Class::Accessor, recommended by Simon Cozens at http://www.perl.com/pub/a/2006/01/26/more_advanced_perl.html. It automagically generates accessor methods, but appears to rely on a %fields type approach, and will not work on inside-out objects.
I got to thinking. Actually generalised accessors for inside-out objects are not that difficult, not need some ev[ai]l doings.


   my %speed;
   my %reg;
   my %owner;
   my %mileage;

   sub set {
      my ($self, $attr, $value) = @_;
      my $key = refaddr $self;

      my $hashref;
      eval "\$hashref = \\\%$attr";

      if ( !defined $hashref ) {
         carp 'Invalid attribute name';
         return;
      }
        
      $hashref->{$key} = $value;
   }

   sub get {
      my ($self, $attr, $value) = @_;
      my $key = refaddr $self;

      my $hashref;
      eval "\$hashref = \\\%$attr";

      if ( !defined $hashref ) {
         carp 'Invalid attribute name';
         return;
      }
        
      return $hashref->{$key};
   }

BTW: I also found how to do formatting: <pre>...</pre>

Wednesday, March 08, 2006

More of the same..

I havn't posted for a while. It's not that nothing has happened, its just that too much has! I'm afraid that the blog suffers when I'm busy. I have been madly updating our 'Advanced Perl with CGI and Web Applications' course, and I am quite pleased with the result. As always I reckon I learnt at least as much as the delegates will (although maybe on more obscure subjects).
For example, did you know you can use __DIE__ and __WARN__ signal handlers to trap Carp::croak and Carp::carp calls?

I returned home from teaching in the USA last Saturday to my latest multiple order from Amazon:
Object Oriented Perl by Damian Conway. A little out of date (2000) but good nontheless, written in typical Damian style. Hacking the Advanced Perl course (two chapters on OO) and reading this book has totally changed my views on Perl as an OO language. The book does not contain anything on inside-out objects, but you can see that Damian was almost there. We now have a chapter on this simple but effective means of encapsulation in the Advanced course.

Number two book is the "Pickaxe book", Programming Ruby. About time I read it, more on that later (hopefully).

Number three is "PHP and MySQL Web Development". A huge tome which (unlike OOP and Ruby) does not look like 'fun'.
Last but not least another O'Reilly pocket book, this time the Linux one. All these pocket books, I need huge pockets. Not to mention another bookshelf....

Monday, January 16, 2006

What a refreshing change

I'm on the West coast of Ireland, Galway, teaching. I don't think anyone will be offended if I say that Galway is a little remote.
The air and the Guiness are all refreshing, but what stopped me in my tracks is the co-operation between businesses. One major international company booked me to give a course, but then said to others in Galway, "Hey, we have a technical course running this week, anyone else interested?" , and so along comes a few guys from other companies that could not justify the course on their own. Apparently this happens all the time.
These companies are major internationals acting like sensibile human beings. Now why can't that happen everywhere?

Wednesday, January 04, 2006

IFS and read

I find myself continually underestimating the Korn shell. I always thought that Bash was neat because we could read into an array with read –a, but of course we can do the same in ksh with read –A. The effect is to split up the input into fields around $IFS. But did you know we can set IFS just for a read statement? Look carefully:

IFS=','
while IFS=':' read -A line
do
end=$(( ${#line[*]} - 1 ))
if [[ ${line[$end]} != /sbin/nologin ]]
then
echo "${line[*]}"
fi
done < /etc/passwd

(display all the lines in /etc/passwd whose last field is not /sbin/nologin, changing the field delimiter from colon to comma).

The first IFS=',' sets IFS for the echo expansion and is not overridden by the IFS on the read line. That IFS only applies for the read, nothing else.

By the way, did you know that the expansion (including $*) uses the first character from IFS, so order matters? So IFS=',.:' gives different results to IFS=':,.'.

All this applies to the standard POSIX shell, as well as Korn shell and Bash, and I was alerted to these features by Classic Shell Scripting (see a previous post).

Finally, a significant bug in pdksh has been fixed in the version with Fedora core 4 (1993-12-28 q). In early versions a pipe generated a child process on each side, so piping into a while loop was not useful, since the while loop ran in a different process to the rest of the script, and so variables could not be saved. Now it all works, consider:

#!/bin/ksh

ps -ef | while read uid pid ppid c stime tty time cmd
do
if [[ $cmd == *xinetd* ]]
then
break
fi
done

echo "Pid of xinetd is: $pid"

Unfortunately this will still not work in Bash, for the same reason as it did not work in the old pdksh. In Bash we can use process substitution instead, and while we are about it we may as well use an ERE (because we can):

while read uid pid ppid c stime tty time cmd
do
if [[ $cmd =~ '^xinetd +' ]]
then
break
fi
done < <(ps -ef)

echo "Pid of xinetd is: $pid"

The man pages for the new pdksh say process substitution works in Korn shell as well, and I am told this works on Solaris, but it does not appear to work on Linux.

The Darke Side