The Darke Side: 2005

Friday, December 30, 2005

Toys for boys

The weather has been kind over Christmas, particularly clear skys for my *new* Meade ETX 105 - that's a telescope by the way. Not exactly the largest - only 4" and yes, size does matter, but I couldn't really justify a larger one given that it usually rains on the rare occasions when I am actually at home.
Good views of Mars in November and early December, now good views of Saturn, and various nebula like M42 and M37. OK, so what's that got to do with programming and stuff?
Well, most modern telescopes have an electronic control called a "goto" system - you get the idea. This is normally controlled by a hand-held device, but can be controlled by a PC using an Open Source object model named ASCOM - Astronomy Common Object Module.
http://ascom-standards.org/faq.html
Currently it appears to be written using COM, which of course is proprietry and realistically only runs on one of the many Microsoft operating systems. OK, you can get it to run on others, but would you want to, really?
This is one I will be looking at further.

Wednesday, December 21, 2005

Switching off terminal echo in Perl

A delegate recently asked me how to switch echo off for password prompting. I couldn't remember, I thought it was Term::ReadLine. Actually it is Term::ReadKey - as expected the Perl Cookbook reminded me. Term::ReadKey is not a base module, but is present on ActiveState and Fedora installations, and easily installed from CPAN. Here is a simple script that works on both Linux and Windows (despite what the documentation says) :

use Term::ReadKey;

print 'Password: ';
ReadMode 'noecho';
my $password = ReadLine; # Corrected
ReadMode 'normal';
chomp $password;

print "Password was: $password\n";

Another delegate said something like "I guess another module will replace characters typed with asterisks". Well, that is not so simple. After some experimentation I came up with the following script, which has to use raw terminal input and take iinto account differences between Windows and Unix/Linux:

use Term::ReadKey;
use strict;

local $| = 1;
print 'Password: ';

my $password = '';
my $retn = $^O eq 'MSWin32'?"\r":"\n";

ReadMode 'raw';

while (1)
{
my $key;
# Loop waiting for a key to be pressed
while (!defined $key)
{
$key = ReadKey -1;
}

last if $key eq $retn;
$password .= $key;

print '*';
}

ReadMode 'normal';
print "\n";

print "Password was: $password\n";

Check the Term::ReadKey documentation for the arguments. Note that I have to wait for a key to be pressed this time, and write an unbuffered '*' (local $| = 1 makes STDOUT unbuffered).

Wednesday, November 16, 2005

Perl subroutine parameter passing - again

One of the reasons Damian Conway does not like prototypes is that (his words) they do not give the results expected. Well, what results do you expect when you don't use them?

It is sometimes said that parameters are passed into Perl subroutines by copy, and some say they are passed by reference. In fact they are passed by magic, and as we all know, any technology sufficiently advanced is indistinguishable from Perl.

What do you expect to be printed?

sub mysub
{
@_ = qw(one two three)
}

@array = qw(The quick brown fox);
mysub (@array);
print "@array\n";

We get 'The quick brown fox '. The array is unchanged, so you might think the array is passed by value. The call:

mysub (qw(The quick brown fox));

also works fine, so no problem. Let's change the subroutine to use a slice instead:

sub mysub
{
@_[0,1,2] = qw(one two three)
}

Now we get 'one two three fox', the array is changed! Changing the elements changes the caller's array, changing the whole array does nothing! Now the line:

mysub (qw(The quick brown fox));

fails with "Modification of a read-only value attempted…"

This is a run-time error, whereas if you use a prototype:

sub mysub (\@)

we get a compile-time error:

"… must be an array (not list)…"

If I preferred run-time over compile-time errors I would be using the Korn shell.

Once the prototype has been specified it does not matter if we alter the caller's array in total or use a slice – the caller's code is always the same.

OK, I admit that altering @_ direct is damn awful code, and it breaks another of Damian's rules. He's not all bad ;-)

I don't believe that an inconsistency in one part of Perl (prototypes) is a justification to switch to an inconsistency in another (pass by magic). Just admit the inconsistencies and celebrate them – or move to Perl 6.

Thursday, October 20, 2005

Perl 6 at EuroOSCON

This is a brief summary of my notes from the Perl 6 and pugs sessions. I have only included the stuff that was new compared to the Perl 6 Appendix in the 'Perl Programming' course. I might have repeated myself.

use strict and use warnings are on by default;

Quotes are only required around a hash key when {} are used.

$hash{'key'} becomes %hash

Barewords are not allowed. Ever. Even for file handles.

fail{…} warn{…} die{…} throw a 'not yet implemented' type warning – the Perl 6 developers must be using that a lot ;-)

Many improvements to interpolation:

"{ executable code like a do block} "

Subroutine calls are allowed: "&mysub(args)"

Interpolate an array: "@array[]"

Interpolate a hash: "%hash{}"

sprintf is probably obsolete, printf definitely is:

print $var.as('%6.2f');

print $hash.as("%-20s: %2.6f", "\n");

Control can be made over exactly what is interpolated:

qq:c(0)/ / # don't interpolate

qq:s(1)/ / # only interpolate scalars

qq:s(1):a(1)/ / # interpolate scalars and arrays

hummmmmmm

Here document syntax is changed:

my $var = q:to /END/ # s(1):a(1):h(1) can be added

…

END

Not sure where the semi-colon goes

Ranges have a new feature:

x..^y means from x up to y-1

x^..^y means from x+1 to y-1

To open a file with an automatic chomp::

my $fh = open $filename

This can be overridden.

while (<$fh>) {…}

becomes:

while = $fh {…}

while (<>){…}

becomes

while = $ARGS{…} or while =<> {…}

=<> is known as the 'fish' operator

There is now a 'prompt' verb

Most built-ins that operator on scalars and arrays are now methods:

$var.substr(…) # returns the substring

$var .= $var.substr(…) # changes the substring

No need for Data::Dumper, instead:

$var.perl()

And there is much more, Damian could not complete his talk because of time, and it is not downloadable. I'll keep you posted as I play with pugs.

Wednesday, October 19, 2005

EuroOSCON - whatever. Is Perl 6 too complex?

I asked the main man, yer actual Larry Wall, this question. Poor guy - he is a very nice chap - thought he was just signing a book for me. There was no one else around, so I consulted the oracle (I won't use an uppercase O, you'll get the wrong idea).

Q. Will Perl 6 still be suitable for guys who just want a better language that ksh or .bat files?
A. The easy things will still be easy

Q. Will Perl 6 still be suitable for sys.admin's?
A. Hey! I'm still a sys.admin.
Me: No you are not, you are a language demi-god.
A. Shrug. Sys.Admins will still be able to use it.
Q. Are you saying that because that is what I want to hear?
A. Sys.Admins will still be able to use it.

Thank-you very much.
We also discussed training strategies for Perl 6, and even for Perl 5, and I wasn't that hard on him. Larry Wall is a very nice chap (oh, I said that).

EuroOSCON 6 - Perl 6 Update

Well I did type all this in, but the network broke when I sent it, and was lost. You will have to wait.

EuroOSCON 5 - Installation Installation Installation

I saw a demo today of a brilliant web-based Perl debugger called Devel::ebug::HTML. It uses the browser, and appears to have everything you want from a visual debugger.
BUT...
Trying to install it is a nighmare. There is no README file, so no list of dependancies. I start downloading and installing them by hand and choke when I come to Catalyst. So I switch off my firewall (oooooooh) and run cpan, and nothing installs. Tried ppm - the same. Everything downloads OK, but every single one fails on the make - it should be using nmake on windows.
Doing this manually would take for ever, so I give up.

Other Open Source products seem to suffer from the same thing, and it is about time the community did something about it. OK, that means you and me. Something is being done about it, there are many people working on this. I appreciate the complexity, but I don't think the underlying issue is technical. It is trying to bring order to the bazaar, and I am not sure that is possible without turning it into M&S.

Tuesday, October 18, 2005

EuroOSCON 4 - 'Ere we go, 'Ere we go, 'Ere we go!

Various speakers and sessions attended (more tonight), here is a summary:

O'Reilly

Book sales year on year current figures:
C# down 3%
Python up 30%
VB down 1%
Perl up 4%
Ruby up > %1000
Linux(generic) down 4%
Read Hat down 24%
Other Linux/Unix up 2,807%

Now does that tell several stories?

Waugh

Microsoft dialog boxes: Q. "Would you like tea or coffee?

The other talks were just saying what a wonderful thing Open Source is, and how it is the future. Yeh Yeh Yeh.

Perl 6

Lots of detail, I have to sort out my notes in detail - another post tomorrow.
Damian and Larry's presentation was titled "Perl 6: End-game"
Damian Conway is a trainer, and is giving 2-day Perl 6 courses NOW, and expects to be giving full blown Perl 6 courses next summer.

Have more sessions this evening, tomorrow is quieter.

EuroOSCON 3 - Perl Best Practices (2) - prototypes

Like the true geek I am, this has kept me awake all night.

Damian Conway is dead against prototypes, and says they should all die a horrible death in a pit filled with spikes.
And snakes (I forgot about the snakes).

Prototypes on $@% are rubbish – I agree, but \$\@\% have some advantages.

Take the swap_arrays code in his book (p.196), let's think of some simple user mistookes:

Without prototypes:

swap_arrays(@sheep, @goats); # Missed out the \

Without prototypes, provided you are using strict, you get:

Can't use string ("Blackface") as an ARRAY ref while "strict refs" in use at …

("Blackface" is a sheep, we know about these things where I do live)

BUT the error is given as being in the subroutine, not at the call (where the actual error is). perl also stops right there, it does not detect that the next argument is wrong as well – we need to do another run to find that.

Mostly if we supply the wrong type, use strict 'refs' will save our bacon (or mutton). BUT:

swap_arrays([qw(one two)], [qw(three four)]);

does not give us an error, surprisingly. OK, at the end of the day we should not be calling a subroutine we do not understand, but if a user can, she will. If she thinks that the two lists are flattened (Damian's assumption with prototypes), then she clearly does not understand the thing anyhow.

With prototypes:

swap_arrays(\@sheep, \@goats); # Added a \

With a prototype we get an error for each argument at fault:

Type of arg 1 to main::swap_arrays must be array (not reference constructor)

Type of arg 2 to main::swap_arrays must be array (not reference constructor)

Execution of C:\proto.pl aborted due to compilation errors.

This time the error is reported correctly.

Also, with:

swap_arrays([qw(one two)], [qw(three four)]);

and a prototype we get a similar error message, again identified with the correct line number.

And another thing

And I nearly forgot, empty prototypes are the only way I know (other than hand-coding) of preventing any arguments being passed, as in:

sub void_sub () { ,,, }

Calling this subroutine would give an error at the point of the erroneous call.

The Volcano beats Joe

Let's face it though, Damian is right. If you want a standard there cannot be exceptions. The problem with prototypes is that everyone thinks they are like C/C++ prototypes, but most people don't even understand THEM. Take:

size_t myfunc (char string[4]);

int another ();

Neither do what most people think they do, and are misleading. So how do you expect to understand Perl prototypes on the same (false) basis? KISS.

Monday, October 17, 2005

EuroOSCON 2 - Perl Best Practices

Damian Conway's talk on this subject woke me up.
A lot of great ideas and tips, methinks I need an extra slide at the end of each chapter on style, or something like it. Do you think people will mine a six day course? That's the problem, I keep picking up more stuff to put into the course!

At least I found something to take out - subroutine prototypes.
A packed room, with lots of experts, and it appears I was the only person who understood prototypes (except Damian, of course). That is scary, and probably indicates we sould not spend so much time on them in our course. Damian hates them, I just find them 'flaky' personally - but no more flaky than lots of other parts of Perl. Then again, it is Damian's mission to destroy flaky Perl, and we have to start somewhere...

I got a chance for a brief chat with Mr. Conway (he comes from that country which just lost the ashes), but I have more questions for him. Here is a reminder for me:

Why use -t *STDIN instead of STDIN?
Shouldn't IO::Prompt offer the locale specific Y/N?
An idea for a pragma - RegExp::NoCapture - make all parenth. groups non-capturing. Could REs be optimised better?

Some good speakers tonight - Damian Conway (again) and Larry Wall - really looking forward to that.

BTW - am I grateful for downloading FireFox? The W/Lan here does not work with MS Internet Explorer. Apparently it is the same problem as I mentioned with ActiveState (see below), Or so they say.....

EuroOSCON 1 - DBI

This conference is so cool, it is Arctic.

Went to the Database Tutorial this morning - another chapter to re-write! Not that there is anything wrong with the Perl Programming Appendix on DBI, but I just learnt so many cool things, that the chapter will probably double in size.
So what did find out?
First, my Perl knowledge is not so bad ;-) It is always worrying going up against a bunch of enthusiastic geeks, but hey, I don't think I let the side down. That was a confidence boost (even lecturers need that sometimes)
Next: dbi:DBM: is a prototyping database bundled with DBI, looks great for quick training demos
Next: queries can be done against a dbh, but this is not efficient (we don't mention that)
Next: if $sth->disconnect is not done, you can get a memory leak
Next: placeholders are very cool if you know how to use them (I feel lots of example slides coming on)
Next: I had missed dbh attributes before (set in the connect)
Next: Lots of things about RaiseError
Next: $sth metadata hashes are tied, so using these invoke method calls
Next: and other stuff as well

A lot I already knew, like American jokes don't work on a European audience.

Friday, October 14, 2005

Fedora 4 /etc/profile

Found a buglet in this file this week.
The /etc/profile file is the first file executed by a Korn shell session on login, but it has been hijacked by Bash. On Linux the Korn shell is not used much, except by training companies of course. I doubt that /etc/profile is ever tested with ksh, so it is not surprising there is a bug, however it is a shame that the mistake is such a simple one - doesn't give one much confidence.

On sign-on to ksh we get a syntax error in /etc/profile, on the line:

if [ $EUID = 0 ]; then

The EUID shell variable is set by Bash, and is a read-only value giving the effective user-id (if you want to know the difference between effective and real user-ids, come on QAs Unix Programming course). The code is checking for a 'root' user. On Korn shell this variable is not set.

The fix is quite easy, quote the variable:

if [ "$EUID" = 0 ]; then

ALWAYS quote variables in single [. One of the effects of using [[ is that quotes are not necessary around a variable, and if the authors had come on QAs Mastering BASH Shell Scripts course, they would have known that.

Tuesday, October 04, 2005

Sneaky bat files

The ActiveState Perl implementation contains some examples of imbedding Perl inside a Windows .bat file. Of course if we wanted to do that sort of thing in a Unix shell we would probably use a 'here' document, but bat files do not have that level of sophistication.

The structure of these bat files is as follows:

@rem = '--*-Perl-*--

perl -x -S "%0" %*

goto endofperl
@rem ';

#!/usr/local/bin/perl -w
#line 9
... Perl code ...
__END__
:endofperl

The "%0" is the name of the current bat file, and %* is a list of the arguments passed into the bat file (like $0 and $* in the Korn shell and Bash).

So we are executing the bat file as if it was a perl script! This is where the –x option to perl comes in. Check perlrun, -x means that the perl code starts at the #! line, and preceding lines are to be ignored. The –S option means that %path% is used to find the perl script. There is a problem with this.

The quotes around "%0" would appear to protect us from imbedded spaces in the filename, but they do not. Paths with imbedded spaces do not work with this method. Strangely, this works if the quotes are omitted. The implication is that most of the ActivePerl bat files do not work with imbedded spaces in path names.

(The pugs Windows installation has a similar problem)

The –S option is not really needed. This is because to run the bat file we need to specify a hierarchic or full pathname anyway (Explorer makes the script's directory its current directory).

The sneaky part is the use of @rem.

The bat file starts with a non-echo'ed remark:

@rem = '--*-Perl-*--
… bat file stuff …
@rem ';

The idea here is that, by a happy coincidence, @rem comment prefix in a bat file is a valid perl command, to assign to an array called @rem. The assignment ends on the final comment, with a close quote (notice that the first @rem has no close quote).
I can only guess that this is in the bat files for historical reasons, since the –x option to perl makes this trickery unnecessary. The goto (whatever that is) is required to leap over the perl code.

The #line directive sets the line number when error reporting, or for the die statement. This means that we can set the line numbers relative to the start of our bat file, rather than the start of the perl code.

Finally, the __END__ label tells perl that this is the end of the script, and ignores any junk that follows.

A link of interest

Will Linux Benefit from Microsoft's SNAFU in Massachusetts? by Tom Adelstein -- Microsoft had everything to lose in Massachusetts and did.

Friday, September 23, 2005

Unix Shell Scripts

A weekend between giving courses, thankfully back to talking Perl next week.
I have been teaching our Korn shell course this week and I enjoyed it, but....
Why do so many people think that Korn shell and Bash are the same?
Bash is hugely more powerful than ksh, with Extended Regular Expressions in version 3, and possibly associative arrays soon (looking at the source, it seems most of the code is already there). There are still things missing, for example ERE matching is supported, so wouldn't a substition statement be a small further step?
Yes I know, danger of feature bloat. With a little more work Bash could be the new Perl 5 ;-) There might actually be a place for that once (if) Perl 6 takes hold (double smiley).

Funny how conventional programmers tend to look down on shell scripting. And funny how system administrators, and their ilk, are expected to write scripts with no training on good programming techniques and practice. Companies spend huge amounts making sure their applications are fast and efficient, then some poor erk runs a badly written script and screws the entire system. Is it the erk's fault? Maybe, but more likely the person who put them in that position.

Here is my contention: anyone can write a shell script, but it takes skill and knowledge to write a good one.

I recommend: Classic Shell Scripting by Robbins and Beebe (O'Reilly) May 2005. The best book on the subject I have come across.

Thursday, September 08, 2005

Redemption

Microsoft, good ol' Microsoft, have a bunch of (useful) Perl scripts on:
http://www.microsoft.com/technet/scriptcenter/scripts/perl/prlindex.mspx
They only use Windows Management Instrumentation (WMI), and they are a bit thin of explaination of the code, but it's a start. One day, maybe, I will get to write my "Perl Programming for Microsoft Windows" course.

Wednesday, September 07, 2005

Bye bye IE, hello Firefox

After downloading ActiveState Perl 5.8.7 I had problems displaying the html Perl documentation. Turns out (http://bugs.activestate.com/show_bug.cgi?id=39240) that the Internet Explorer patches with XP SP2 "locks-down" some features, like actually displaying web pages. Sorry, what did you say a browser is for?

I'm all for security, but preventing access to web pages on the local machine, even when it is not connected to the web? Give me a break!

So, hello Firefox! I used it months ago, but never got around to installing it on my nice new machine. Wel now I had a real need, and it works a treat! I am just ashamed I took so long.

So Mister Microsoft, if you want to know why I am not using your product, it is simply because it does not work - nothing personal ;-)

Wednesday, August 31, 2005

Win32::StreamNames

A truly excelent new module has just appeared on CPAN, named Win32::StreamNames. It returns a list of file stream names for a specified file.

OK, I'll come clean. Yes it's one of mine. I was proudly showing off my blanket trick (see below) to another QA instructor when he said "I suppose you use a stream". Well, aside from a small river, a stream to me is an IO channel, or maybe a Unix device driver STREAM. This is neither. Enhanced at NTFS5 at Windows NT 5, correction, Windows 2000 ;-), a Windows File object points to Stream control blocks, which can include a Named stream. The names are of the form file-name:stream-name:$DATA. They are not normally visible to Windows Explorer, the dir command, Perl globbing, or readdir (using Win32 API). BackupRead Win32 API can retrieve the stream names, although it is not easy to use. Hence the module.

See it at your local CPAN mirror today! http://cpan.search.org

Friday, August 19, 2005

Blanket

While on the train, I decided to ammuse myself by writing a version of Damian Conway's 'bleach' program. Basically it "encrypts" a perl script with whitespace, so when it is viewed it looks blank.
The method I used (I couldn't remember Damian's exact code) is to take the ordinal number of each character in the character set, and write that number of space characters to the file.

For example, 'A' is decimal 65 in the character set (ISO Latin 1 and ASCII), so for that character I write a line contining 65 spaces. To get the number I use the Perl built-in function 'ord'. To decode I just have to get the length of each line and feed that into the 'chr' function to get back the original.

I showed this to my class as an ammusement, and one delegate (thank-you Daniel) played around with it. He discovered that if you zip the blanked file you get a very small archive. Smaller (about two-thirds) the size of the original, zipped. Must be the way the zip algorithm works with repeated characters.

I have just thought of another tweak I can do .............

Saturday, June 04, 2005

pdksh integer arrays

Just found a bug in pdksh version 5.2.14 – it cannot handle integers arrays correctly (pdksh is the Public Domain Korn Shell). I first discovered this with a lottery script, but actually it is very easy to reproduce:
typeset –i array
array[0]=42
array[1]=43
array[2]=44
array[3]=45
echo "${array[@]}"

gives: 45 45 45 45

This works correctly if typeset –i is not used. Bash, with declare –ia array, works fine, as you might expect.

Friday, May 27, 2005

Unix for Perl Programmers

This is an introduction to using the Unix API from Perl.

It concentrates on POSIX compliant system calls, and introduces System V Release 4 Unix. I look at the fundamental concepts of a process and its interaction with the Unix kernel, the difference between library calls and system calls is discussed, and I give an overview of error handling. Later postings will look at API calls in detail.

I have assumed a degree of familiarity with Perl that can be obtained by attending the QA Perl Programming course.

Unix compatibility

Unix has a long and chequered history, with a number of standards which often conflict. An attempt to standardise application code interfaces into Posix 1 (IEEE Std 1003.1, now the X-Open SUS) has not been universally adopted.
Use defensive programming techniques where possible, for example do not hard-code system limits, but call functions at run-time to find what these are (see later).
As Perl programmers we are protected from many of the differences between operating systems, but we are not immune from them. Most Perl statements are portable, but the lower we delve into the innards of Unix, the less portable we become. The differences are often expressed in C terms, even in the Perl documentation, so a knowledge of that language is very beneficial.
Use the built-in Perl variable $^O to mark blocks of code you know are specific to certain versions of Unix.
In this article I often give example code and function prototypes. Arguments and their types have been described in a generic way where possible, however you should not assume that they exactly match the systems you use, or the version of Perl.

Perl with Unix: call interfaces

The kernel is the heart of a Unix system. Its job is primarily one of resource management; it looks after the physical resources (hardware) of the computer system and shares them amongst all of the processes that wish to use them.
The kernel can be thought of as providing a number of services to programs, such as a file system for data and program storage, management of the RAM in the computer and a means of accessing devices attached to the computer. The details of the actual hardware are hidden from programs by the kernel; it presents a "virtual-machine" interface to the programs. The virtual-machine interface is consistent across almost all versions of Unix, regardless of hardware details.
Programmers will occasionally require the kernel to perform some function for them, such as performing input or output, or starting a new process. Such requests are made through a number of well-known special function calls known as system calls. The set of system calls defines the interface to the kernel for programs and is kept consistent as much as possible across different variants of Unix. Nevertheless, such low level access is by definition dependant on the operating system, and often requires detailed knowledge. To enable portability, and to supply easy-to-use interfaces, a higher level language like C provides its own RunTime Library (RTL). This takes an ISO standard call and converts it into an operating system specific call, and maybe adds other features as well. Other languages and products have their own RTLs, for example an RDBMS will have a RunTime Library to enable embedded SQL to run.

Perl is written in C, so uses the C RTL, and many of the built-in function calls reflect that. Sometimes these are too general, so we have available lower-level interfaces to make direct calls to kernel APIs, but remember that these were designed to be called from C, not Perl. Perl modules often make use of low-level calls, although they sometimes "cheat" by using imbedded C.

System Calls and Library Functions

Most Unix programs will need to make use of one or more library functions or system calls. While they may both appear to be the same from the program’s point of view, there are some important differences between library functions and system calls.

System calls are the means by which a process can request services from the kernel. Services include accessing and manipulating files and devices, and process-related operations. System calls are documented in section 2 of the Unix reference manual, and are platform specific. Some Perl built-ins make system calls, and appear to be the same as the kernel call. The read() function is an example. The Perl version will make a call to the C library function fread(3) eventually, but along the way adds value. It can handle utf8 files, for example. If we want the kernel function read(2) we need to call the Perl function sysread (there is also a syswrite and a sysopen). Here are some examples:

Perl Kernel call Library Function

$$ getpid
fork fork
open fopen
read fread
sysopen open
sysread read

Library functions are simply C functions written by someone else and supplied via libraries of functions. These libraries are linked to perl either statically (before the program runs) or dynamically (at run time). Using a library function does not necessarily require any help from the kernel.
C Runtime Library functions are documented in section 3 of the Unix reference manual, but Perl does not necessarily use them. The Library functions themselves can have portability problems, so Perl often implements its own. See perldoc perlclib.

The major portion of the work carried out by the various bodies that have developed standards for Unix, such as the IEEE (POSIX) and the Open Group (X/Open Portability Guide, Unix 1170, etc.) have been to standardise the names, arguments and results of library functions and system calls. The result is that it the various versions of Unix that exist today should all contain a compatible set of these, forming a standard API supporting the development of applications that can be ported with minimal efforts.

Perl system interfaces

There are a couple of different methods that may be used to imbed C calls into Perl. The first is the syscall interface. This allows a call to be made direct to a C function, assuming the function's prototype is defined in a header file. The h2ph utility will convert the header file to a Perl .ph file, which can then be loaded with use or require. Unfortunately the h2ph utility cannot cope with all structures and calls, so sometimes the resulting .ph files have to be "tweaked" manually to get them to compile. The most difficult part of using syscall is in converting the variable types between Perl and C, using pack and unpack. A detailed knowledge of C is required for this, as well as an understanding of the function you are about to call.
The XS interface is much easier to use, but still needs an understanding of C. Again we convert the header file, but this time using h2xs. We imbed C code in an XS code block, then build a new module using the XS compiler (called xsubpp). This time the conversion between C and Perl variables is done automatically.
There is a simple XS tutorial available in perldoc, called perlXStut, and the full XS documentation is in perlxs.

Converting between Perl and C

Perl variables are held in an "internal" format, and do not usually have a 1:1 mapping with C primitive types. However, many low-level functions make C system calls, and so we often need to supply a C struct from Perl.

We convert from Perl to C using the built-in function pack:

C scalar = pack TEMPLATE, LIST

where TEMPLATE indicates the type of each field (see below).

To convert from C to Perl we (unsurprisingly) use unpack:

Perl list = unpack TEMPLATE, SCALAR

There are over 30 different template characters, some common ones are:

A A text (ASCII) string, will be space padded.
Z A null terminated (ASCIZ) string, will be null padded.
c A signed char value.
C An unsigned char value. Only does bytes. See U for Unicode.
s A signed short value.
S An unsigned short value (16 bits).
i A signed integer value.
I An unsigned integer value.
l A signed long value.
L An unsigned long value (32 bits).
f A single-precision float in the native format.
d A double-precision float in the native format.
D A long double-precision float in the native format.
x A null byte.

for a full list see perldoc -f pack.

The following example is required to call fcntl for a file lock:

$wlock = pack 'sslli',(F_WRLCK, SEEK_SET, $rec, $rsize, 0);
fctnl( ... ) # Some C call

Unpack the returned flock structure:

($locked, $pid) = (unpack 'sslli', $qlock)[2,4];

The above examples show an interesting portability issue, they do not work on Linux 2.4. This is because the Linux implementation of the perl fcntl function does not call fcntl(2), but a lower level function, fcntl64. This means that the normal 'l', which is 32-bits, has to be doubled up:

my $wlock = pack 'sslllli', (F_WRLCK, SEEK_SET, 0,
$rec, 0, $rsize, 0);

(It took me ages to figure out that Linux perl uses fcntl64, I eventually ran strace(1) on my perl script to discover why it would not work).

POSIX module

The POSIX module is an important tool for Perl programmers on Unix. The interfaces it provides are mostly portable, although you must still be aware of the limitations of your particular system. The POSIX 1 standard (there are actually 13 parts to POSIX) defines a C language interface, and so some functions are not applicable to Perl. Perl abstracts messy tasks like dynamic memory allocation, so strictly speaking the POSIX module does not, and cannot, provide true POSIX compliance, only POSIX functionality. Being a C interface does mean that knowledge of C is a definite advantage when understanding the documentation.
Some POSIX functionality is provided by standard built-in Perl functions and variables, and it is generally better to use those rather than the POSIX module versions, since Perl is optimised with them, but there is not a huge difference.
To use the POSIX module effectively requires some C knowledge, but you do not need to be an expert.

Later postings will explore the POSIX module further.

Error Handling

Always check a function call for errors. As a rule, a return value of FALSE (0 or undef) indicates that an error occurred that prevented the system call from completing successfully. The exact details of the return values of the system calls are found on the relevant manual pages. Return values should always be checked, since undetected error conditions may cause serious problems that are more difficult to locate later in a program.

Some low-level functions return a value which could be zero, yet do not wish this to indicate an error. For example the tell function returns the current file position, which could be zero (beginning of file). In this case the text value '0 but true' is returned instead of just zero. This evaluates to TRUE in Boolean context, but 0 in numeric, yet is immune to usual warnings about use of non-numeric text.
By convention error messages should be reported to STDERR, using any of the methods shown below.

print STDERR "Oops: $!\n" # not fatal, not trappable

die "A death: $!" # fatal, but trappable

warn "Will Robinson!" # not fatal, trappable

The warn and die functions output the perl script line number, and both calls may be trapped using signal handling. The usual way to trap die is by using an eval block (exception handling).
If a system call fails, the global variable $! will contain a system-defined error number, in numeric context, and the error text in string context. The variable $! is only set when an error occurs. It is not cleared on a successful system call. It is only safe to rely on this value when a system call has failed.

Errno module

Unix recognises a standard set of error codes, not all system calls can return all errors. Which errors are returned by which functions are described in the manual pages.

Some useful error codes and constants (conditionally exported by Errno.pm):

EPERM Operation not permitted
ENOENT No such file or directory
ESRCH No such process
EIO I/O error
ENOEXEC Exec format error
ECHILD No child processes
EACCES Permission denied
EEXIST File already exists
ENOTDIR Not a directory

Note that some errors such as EAGAIN and EINTR may not be errors but indicate that a system call must be attempted again (EAGAIN because conditions were not right for the call to complete and EINTR because the call was interrupted by some other system activity).
The package Errno defines the symbolic constants that represent system error conditions. It also defines the hash %!, which has a key for each error constant, and its value is TRUE if the error has occurred. Not all error constants are portable, so you should not assume any particular code exists in %!.

Example - Error Handling

use Errno; # No need to import anything

my $file = 'input.txt';

if ( !open (HANDLE, "$file") )
{
if ($!{ENOENT})
{
print STDERR "$file does not exist\n";
...
}
elsif ($!{EACCES})
{
print STDERR "$file: Permission denied\n"
...
}
else
{
die "Unable to open $file: $!";
}
}

This example shows how Errno can be used to handle different error conditions.
If the call failed, then after an error message is printed, the program terminates with die. This is safer than exit, particularly in a module, since it may be trapped in the calling routine.
Errno can also import POSIX codes with:
use Errno ':POSIX';
An alternative to the Errno module is the POSIX module:
use POSIX ':errno_h';
which exports the error number constants, but not %!.

Command Line Arguments

When a program is executed, it is passed a variable-length list of arguments. The shell arranges for these arguments to be the command name and arguments typed at the command line.
The argument list is accessible to the program by the array @ARGV, so the number of arguments passed can be obtained by using the array in scalar context. Unlike C, the first element is the first argument, not the program name (that is available from $0).
There is a system-defined maximum size of the argument list. Normally, it is not allowed to be larger than 5KB or 10KB.

Option Processing

In the past, Unix commands were written by different contributors, often employing slightly different methods of dealing with the command-line arguments of their programs. By convention, Unix program arguments are either options, which are usually single letters preceded by the "-" character, or arguments to the command itself, usually pathnames. In recent versions of Unix, a definition has been developed of a "standard" command interface, which defines a basic format for command lines. New programs should, if at all possible, be written to conform to this format.
The format is fairly flexible, but does impose a few more restrictions:
Options must be a single character.
All options and their associated arguments must appear before the main arguments.
Multiple option characters may be grouped behind a single "-", except where an option requires an argument.
The standard module Getopt::Std offers two routines, getopt and getopts, the later being the more useful. getopts reads the command line argument list array @ARGV, and extracts valid switches into side-effect variables, or into a predefined hash. If an option is to have an associated argument, the option letter must be followed by a colon (:) character.
Valid options are removed from @ARGV. The end of the options is signified by encountering a non-option argument (ie.. one which does not start with a "-") or the "special" option "--", which is also removed from @ARGV.
If getopts encounters an invalid option then it prints a message on the standard error stream.
Further features are available in the extended version Getopts::Long.

Examples - Options and Arguments

These two code fragments show examples of both methods of syntax.

Using global variables

use strict;
use Getopt::Std;

our ($opt_a, $opt_d, $opt_l);
getopts('ad:l');
print "a: $opt_a, d: $opt_d, l: $opt_l\n";
print "Remaining arguments: @ARGV\n\n";

Using a hash

use strict;

use Getopt::Std;
my %options;

getopts('ad:l', \%options);

while (my ($key, $value) = each %options)
{
print "Switch: $key, value: $value\n"
}

print "Remaining arguments: @ARGV\n";

When using globals, with use strict, we must pre-defined them using our. When using a hash we pass a reference.
In both cases the value is 1, TRUE, if the switch was set. When a value is required it will be set in the side effect variable or the hash.

The Environment block

Every process has access to a variable-length list of pointers to strings, known as environment variables.
Environment variables can be set up from the shell with the export or setenv (csh) commands. They are useful for communicating information on a global level to multiple commands, as they can be accessed easily from within programs and shell scripts.
To see the current environment from the shell, use the env command.
The entire environment is made available to Perl through the hash variable %ENV. The keys are the environment variable names, with values.

print "Home directory is ",
exists($ENV{'HOME'})?$ENV{'HOME'}:"unknown", "\n";

TMTOWTDI

if ( exists $ENV{'HOME'} )
{
print "Home directory is $ENV{'HOME'}\n"
}
else
{
print "Home directory is unknown\n"
}

The example retrieves the value of the HOME environment variable (ie.. the pathname of the user’s home directory).
If HOME is set in the environment, its value is returned, otherwise the key will not exist, and a warning will be given. Therefore it may be advisable to check with exists first.
Incidentally, the HOME environment variable is the default directory for the built-in chdir function (like a shell cd command). If HOME is not set, then LOGDIR is used.

Limits

Since Unix is supported on a wide range of platforms, it is reasonable to assume that there will be differences in detail among them. For example some may allow long filenames where others may still impose the old limit of 14characters. Some may support sophisticated signal-driven job control while others may not.
In developing Unix programs that are intended to work across a wide range of Unix implementations, it is desirable to be able to represent such properties in a platform independent manner, perhaps discovering details about a particular system at run time.
POSIX defines a number of symbolic constants, specified in the POSIX module that govern the details of data types, their sizes and max/min values.
Run time values can be checked using one of three functions - sysconf allows general properties such as the number of system clock ticks per second (useful in certain timing calculations), the maximum number of files that a process can hold open at one time, and the maximum number of child processes that a process can create. Properties that are more closely related to files and directories, such as the length of a filename or pathname, are discovered using the system call pathconf or fpathconf.
Many of these “runtime” limits are also mentioned in the POSIX module, but the values here are absolute minimums. For example, a system may set the maximum number of open files per process to be any value greater than _POSIX_OPEN_MAX, usually defined as 16.

POSIX::sysconf

General system limits and properties can be queried using the sysconf function. The argument is an integer symbolic constant representing the value to be queried. The return value is the current setting of the value, or -1.
If the name parameter represents a property that is valid, but not supported, then errno will be clear even though the function returns -1. An invalid name causes the function to return -1 and set errno to EINVAL.

$max_open = sysconf(_SC_OPEN_MAX);

if (!defined $max_open)
{
print STDERR "Cannot determine _SC_OPEN_MAX: $!"
}
else
{
print "Max open files: $max_open\n";
}

To check on values that can be queried using sysconf(), consult the manual page or the include files or .

POSIX::pathconf

Some limits may have different values on different logical filesystems on the same overall Unix system. These are mostly related to the properties of the filesystem, such as the maximum length of a filename, or a pathname.
To query such a value, use the POSIX::pathconf function. It is similar in its operation to sysconf, except that it requires either a pathname or a file descriptor representing an already open file to be passed in addition to the name of the limit being queried. This indicates on which filesystem the limit is being checked.

$max_name_len = pathconf('.', _PC_NAME_MAX);

if (!defined $max_name_len)
{
print STDERR "Cannot determine _PC_NAME_MAX: $!"
}
else
{
print "Max name length: $max_name_len\n";
}

The return values from pathconf are the same as those from sysconf.
fpathconf has the same functionality as pathconf, except is takes as its first argument a file descriptor instead of a file name. File descriptors will be described later.

Dates and Times

Unix systems count time as a number of seconds from a fixed starting point, called the Epoch, that is defined as midnight (00:00:00) on January 1st, 1970.
This is how time is represented internally in the operating system, applications use a number of functions to translate this into the appropriate representation, taking into account timezones and daylight saving time.
To find the number of seconds since epoch, use the time() built-in function. This can be useful for timing operations to the nearest second.
On some systems it may be possible to use an alternative function, gettimeofday(). This function was originally part of the BSD Unix family, it is often useful because it provides a higher resolution of the time measured. To use gettimeofday() you are required to pass in an indication of a timezone as the second argument. It is acceptable to use NULL here, signifying the current timezone.

For measuring time in better granularity than one second, you may use either the Time::HiRes module from CPAN, or if you have gettimeofday(2), you may be able to use the syscall interface of Perl, see the perlfaq8 manpage for details.

Displaying Dates and Times

Although the central time representation in Unix is a count of seconds, there are many ways in which the date and time can be displayed. A collection of functions allow the programmer to convert the date from the seconds count into various formats, including formatted strings and a structured “broken-down time” representation.

The most commonly used function is localtime(), which takes the basic second count and converts it to a string, in the correct timezone and ready for printing: By default the current time is returned, but output from time() can be used as an argument.

The Time module offers interfaces to localtime and gmtime where the time elements are named.

“Broken Down Time” Array

Sometime it is desirable to obtain partial date information, for example the month or year. To do this, it is best to use an alternative representation known as “broken down time”, as returned by localtime in list context.

Index name Purpose Range Notes
0 sec Seconds after minute 0-61 2 "leap seconds"
1 min Minutes after hour 0-59
2 hour Hours 0-23
3 mday Day of the month 1-31
4 mon Month of the year 0-11 July is 6 (not 7)
5 year Years since 1900 2003 is 103
6 wday Days since Sunday 0-6
7 yday Days since 1 Jan 0-365
8 isdst In Daylight Saving Time -1/0/1

The elements in the list allow the programmer to extract any pertinent pieces of date or time information to allow more specialised calculations to be performed. The sec range requires some explanation: normally the upper limit is 59, but up to 2 extra "leap seconds" can sometimes be added.
To translate from the number of seconds since epoch to broken down time, use localtime or gmtime:
@dt = gmtime ( $secs );
@dt = localtime ($secs );
If the argument is omitted the current time is used. The difference is that localtime will return the time correct in the current timezone, gmtime returns the time correct in GMT (basically this is the same as UTC).
Alternatively, the POSIX function strftime allows the programmer to apply printf()-like formatting so that the exact format of the date can be specified when it is printed. Consult the manual page for details of the formatting options - they are considerable…
It is possible to translate from broken down time back to seconds since epoch. To do this use the POSIX function mktime.

Summary

Many standard Unix system APIs can be called from Perl, some are built-in and some come from modules, notably POSIX.

The original APIs were designed to be called from C, so knowledge of that language is useful. If you wish to know more about the C interfaces, come on QA's Unix Programming Course.

I hope to post examples of other Unix API, for example IO, Sockets, and System V IPC.

Friday, April 29, 2005

Doh!

It all started when Ryan Air cancelled my flight to Dublin on a Sunday evening. That meant I had to travel on the Monday, and had half an afternoon with nothing to do. "Upgrade my laptop to Fedora Core 3!", I thought. The Devil makes work for idle hands.

Everything went wonderfully until I got to disk 4 of 4, which would not read. I had, of course, skipped the media check as being too time consuming. Mistake number one.

There is no exit from a Linux install, so I had to remove my laptop battery to try to reboot. Except it wouldn't, not even Windows XP (it was a dual boot machine).

So I tried to rescue the hard-disk by reinstalling GRUB (the Linux boot loader), no dice.

So I tried to rescue the hard-disk by booting Windows XP in rescue mode, except I had forgotten Administrator's password (it had been set under pressure a couple of years previously when attached to a client's network, and hadn't been used since).

So I tried to download a password cracker, except my company had blocked such sites.

So I asked someone else to get one for me, except I had not bothered to bring my floppy disk drive to Dublin. Double doh!

So I reinstalled Red Hat 9 and recovered the Linux partitions, but still could not boot Windows XP (although I could "see" them from Linux).

Saturday, back home and on holiday (joke!). Found the Toshiba recovery disk for my machine – wonderful! Except that it blasted everything in sight, including all my Linux and Windows XP partitions.

So I reinstalled Windows XP, and started reinstalling products. (Great assistance from out support guys, I was reading emails within minutes of installing MS Office).

Before I got too far I decided to reinstall Fedora (with a new disk 4). All went OK except it would not boot – Linux or Windows! Triple doh?!

So I tried again, this time the install software crashed with an assertion failure. After much gnashing of teeth I googled the error. Turned out to be a "known" bug in the utility 'parted' (Bugzilla Bug 138419, if anyone is interested). It appears that this bug was the cause of the original corruption. I read up on the bug, and guessed that PartitionMagic might fix it from Windows, except I couldn't boot.

So I found an old MS-DOS 6 boot disk, and did a swift fdisk /mbr. That enabled me to boot Windows XP and run PartitionMagic.

Reinstalled Fedora – third time lucky!

So now it is Friday of my one week's holiday and I have reinstalled most of my software and restored my backups. I lost some Perl 6 work I had been doing, but I should be able to regenerate that fairly easily.

Lesson's learnt:

ALWAYS do a media check before a Linux install.
Do not throw away Windows, it can get you out of a hole.
Linux is not bug free (!).
If you are going to trash your machine, do it when you have a week to spare.

Wednesday, April 06, 2005

Pugs

Just downloaded the latest version: http://search.cpan.org/~autrijus/Perl6-Pugs-6.0.14/
Will I ever get time to have a proper play with pugs?
What's pugs? It is a prototype Perl 6 written in Haskell. On CPAN it is rated ***** (the highest) and deserves more.
It is slow, and does not support all of Perl 6, but is only meant as a kinda 'proof of concept' tool. It is also great for sado's like me who can't wait for Perl 6.

Perl 6

This is an overview of Perl 6. To avoid a posting of biblical proportions it lacks any technical detail - that will follow in further postings.

Perl 6 is the next generation of the highly successfully Perl programming language. This posting discusses why Perl 6 is necessary,has had to be developed, and gives an overview of changes.

Why?

Perl was originally written to replace the scripting language awk when Larry Wall, the author of Perl, became frustrated with awk’s limitations. Perl’s strength lay in text handling, and was an obvious choice for manipulating HTML. It became known as “the duct tape of the web”, although its design pre-dated the web boom. Perl 5, released in 1994, gave it the flexibility and power to rival conventional 4GLs, and boosted it beyond mere scripting and report generation. Successive releases of Perl added database support, references, OO, imbedded low-level code, and a huge range of add-on modules.

But all is not well, ten years is a long time in IT. Essentially designed to slot between C and Unix shell programs, Perl is showing its age. The syntax of Perl 5 is undeniably situated firmly in Unix-land, yet the product has an important niche in proprietary operating systems, such as Microsoft Windows.

Unix was originally a process-based system, and Perl was designed to work within that. Over recent years the advantages of multi-threading beyond "just" GUI systems have become clear, and a number of threading models are in common use. Perl has had several tries at multi-threading implementations but none have been completely satisfactory. Retrofitting multi-threading is difficult and painful.

In Perl 5 Object Orientation capabilities are tacked on as an after-thought. Frankly this is not a problem for Administrators and for small projects, but it is not scalable and lacks many features expected by today's (mostly OO) programmers. For example: a Perl 5 class has no data layout imposed or inherited (inheritance is really only through methods), no data hiding or protection is available (the whole object is visible), and there is no prototype checking for method calls. In Perl 5, multi-threading and OO do not mix; if an object is passed between threads it looses its class.

Perl 5 language can make it difficult to learn. That Perl has been able to get away with this is a tribute to its power and usefulness. Part of the problem has been a basic rule that, so far as possible, upgrades should be backward compatible. In other words, new features could not replace the old ones, only supplement them. "There's more than one way to do it" can be powerful but is confusing when trying to learn the right way to do it.

Perl has begun to loose its dominance on the web with the growth of ASP and the popularity of alternatives like JavaScript and PHP. On the application side Perl is threatened with competition from well-designed rapid development tools and the almost universal adoption of Object Orientation. On the scripting side we see the growth of modern feature-rich tools such as Bash, and the up-coming Microsoft Shell for Longhorn. Directly competing with Perl, and gaining popularity, are the OO scripting languages Ruby and Python. In the long term these could sideline Perl and destine it to the Great Recycle Bin in the sky.

The source code of the Perl interpreter has been highly hacked over the years to the extent that changes are now difficult to implement without breaking apparently unrelated code. This is a common problem with software that has had to adapt to meet changing needs and expectations.

If the language and culture is to survive a radical change is needed.

Enter Perl 6. Perl 6 is being designed and specified - that in itself is a novelty - by a confederation of programmers from the Perl community, shepherded by Larry Wall.

What's new?

Perl 6 is designed for change, and an expected life of over twenty years. There is no desire for backward compatibility. compatibility, Perl 5 scripts will rarely, if ever, run directly on Perl 6. Fortunately conversion routines and transition aids are being produced.

There is a price to pay for Perl 6 in the investment made in Perl 5 skills. Those who are comfortable with Perl 5 are naturally aggrieved to find they are beginners again. It should be made clear that Perl 6 is still undeniably Perl, after all it was written by Perl people who have probably invested more in Perl skills anyone else. There are a huge number of new features, but many of them are optional. As with any other product, users of Perl 6 will learn and use just those features needed to get the job done.

A technical overview of changes in Perl 6 is given as an Appendix to the QA Perl 5 Programming course.

Parrot

In Perl 6, the language syntax and implementation internals have been separated, in a similar way to the Common Language Runtime in Microsoft's .Net product, and Sun's Java runtime. The language independent runtime component is called Parrot. Parrot is downloadable now, and you can write Parrot Assembler (PASM) if you really must. A higher level interface is Parrot Intermediate Representation, or PIR. PIR is expected to take the place of Perl 5's XS interface for low-level code.

Although its aim was to drive Perl 6 code, it was realised quite early on that other similar languages could be plugged into Parrot. Implementations of Ruby, Python, Java, PHP, and others, exist or are being developed. More can be added "merely" by generating the writing the PASM or PIR interface. Part of the attraction of this approach is that objects created for Parrot may be shared, regardless of the language. The Database Interface (DBI) module from Perl 5 is being rewritten for Parrot, so it should be usable from any Parrot language. Parrot also enables general functionality such as multi-threading and garbage collection to be abstracted by the language.

While there is no specific aim to compete, there is an interesting overlap between Parrot and Microsoft's .Net, more specifically between Parrot and Mono, which is the portable implementation of .Net, now owned by Novell. For example, both intend to run Python and PHP, and C# on Parrot is a possibility. Will Parrot and .Net compete head-to–head? Possibly, but it is doubtful that Parrot will run C# as fast as .Net, or that .Net will run Perl 6 in a satisfactory way.

Perl 6 OO

Perl 6 is Object Oriented from the ground up and for many Perl programmers that will be enough reason to use it. The syntax to define a class is similar to other languages, and variables are objects. For example, to find the length of a variable we now call a method on it, rather than a library routine. Classes in Perl 6 replace packages in Perl 5, and data layout may be imposed and inherited. Private, public and protected attributes and methods are supported, as well as delegation, roles and so on.

Multi-threading and OO should co-exist happily in Perl 6, but implementation details are still under wraps. The new garbage collector should remove memory leak problems that plague Perl 5 multi-threading.

In Perl 6, methods and subroutines have been considerably improved, allowing all the features normally expected, including named and typed arguments, and overloading.

The inclusion of Object Orientation is not to everyone’s taste, and for many users of the language it will be an irrelevance. Not every job needs OO and knowledge of OO will not be a pre-requisite to enable effective use of Perl 6.

Basic syntax and operator changes

There are a number of changes to the basic Perl syntax, which will be immediately obvious to a Perl 5 programmer, such as accessing an array element or hash value. Many of the syntax elements and operators are different, and some are less than obvious. This is not the place to list them, partly because they have not yet been finalised.

A number of new operators are available in Perl 6, particularly for handling lists. Some allow operations on all elements in a list and others give the ability to join and match lists in a number of different ways.

The reasoning behind the operator changes is to make Perl 6 easier. New Perl programmers will be able grasp its concepts more readily. Experienced Perl 5 programmers will have to learn new ways, but at least they already know the concepts and can concentrate on learning the new syntax.

Regular Expressions – Rules

A Regular Expression (RE) describes text of interest, usually for extraction or editing. It consists of a "meta-language", and the one chosen for Perl 5 is a well known dialect based on POSIX and used by many other languages and tools, including Microsoft .Net.

Perl 6 has modified the language with the intention of clarifying it. Very few of the familiar meta-characters are changed in Perl 6, but the most significant (the one everyone is shouting about) is the use of white space. In Perl 6 white space is used as in any other part of the program - it is not significant unless quoted.

Changes such as new quantifier delimiters are part of assertion syntax. Assertions are an extension of the concept of character classes that either match (true) or do not (false). Assertions return rules that can be stored meta-characters, a single or list of literals, array elements or hash keys, and even code. Many of these features can be achieved in Perl 5, but rules bring them together. The Perl 6 language itself will be expressed in rules.

Traits and Attributes

A feature of most scripting languages is that variables are type-less until a value is assigned to them. Later in its life the variable (and possibly its value) might change type. Perl 5 has always had that feature and, by default, so does Perl 6. For some operations typed variables are required and, although some checking is possible in Perl 5, Perl 6 extends this with optional built-in traits applied at compile-time. Aside from the additional help these supply to program logic, they also help the optimiser.

Attributes enable a value to be overridden depending on its context at run-time. This is a familiar theme in Perl 5, but Perl 6 has introduced many more contexts and ways to force them.

When?

Good question. We were promised a development release in Quarter 3, 2005, but funding problems have caused slippage. At the time of writing a pessimistic guess is Quarter 1 2006 for the development release and Quarter 1 2007 for a production release. What state the development release will be in is anyone's guess, but for sure it's website will get a huge number of downloads from day 1. The community will make it work, and work well.

Is Perl 5 dead?

No, far from it. Partly because of the uncertain Perl 6 timetable, Perl 5 maintenance continues, and releases are planned for several years yet. Unlike commercial organisations, the Perl community has no interest in pressuring people to upgrade, Perl 4 maintenance continued for some time after 5 was released.

Ponie is rather contrived from Perl On New Internal Engine. It is the interface required to run Perl 5 upon Parrot. It should enable some of the advantages of Parrot, like operating system abstraction, to be fitted to Perl 5. Versions of Perl 5 running with Ponie on Parrot should be available at Perl 5.12. It is intended that eventually all Perl 5 development will move to Ponie on Parrot.

Perl 5 modules from CPAN should be able to run on Perl 6 via Ponie. This will reduce the problem of a lack of Perl 6 modules in the early days.

Conclusion

Perl 6 represents a change of emphasis for the Perl language. It should be easier to learn and use for today's generation of programmers. Existing Perl programmers will hate some changes but like others. Some will stay on Perl 5, but will be fighting a rear-guard action as the benefits of Perl 6 become more apparent.

QA (www.qa.com) intends to produce a Perl 6 for Perl 5 Programmers course as soon as a viable version of Perl 6 is available. Pure Perl 6 courses will then follow. It is intended to continue updating the Perl 5 Programming course to keep it in line with Perl 5 maintenance releases. And of course when Perl 7 is released…

From Larry Wall:
"Perl 6 is merely the prototype for Perl 7.:-)"

References

A technical overview of changes in Perl 6 is given as an Appendix to the QA Perl 5 Programming course.

The Perl 6 development web site is dev.perl.org/perl6. Larry Wall's explanations are given in Apocalypses at dev.perl.org/perl6/apocalypse, with further information and examples in Exegeses, dev.perl.org/perl6/exegesis. These are a little out of date, but the Synopses at dev.perl.org/perl6/synopsis were updated in December 2004.

A fortnightly digest of events in Perl 6 development is posted onto the O'Reilly Perl website, www.perl.com.