Perl on Java? An Introduction to the Sleep LanguageThe most popular scripting languages are available in some form for the Java platform. We have Jacl for TCL, Jython for Python, and JRuby for Ruby. One offering is missing from this bunch: what Java offering exists for the Perl hackers of the world?
In this article, I would like to introduce Sleep. Sleep is a Java-based scripting language heavily inspired by Perl. Sleep is what I wrote when nothing like Perl was available to build a scriptable Java IRC client. I would like to introduce Sleep in terms of its similarities to Perl and what it brings to the Java platform. Sleep can be downloaded at the Sleep Scripting Project home page. A simple "Hello World" script in Sleep looks like this:
println("Hello World");
To run the above script, copy and paste it into a file called hello.sl and type:
java -jar sleep.jar hello.sl
There are a zillion scripting languages on the Java platform, each with its own strengths and weaknesses. With all of these scripting language choices, why does Java need a Perl offering? Perl is an incredibly powerful language for text and data processing. Perl excels at taking input, extracting stuff from it, chewing it up several hundred times, and finally, outputting the mess however the programmer would like. Perl is often referred to as the duct tape of the internet. This is due to its many uses as a "glue"-type language.
Sleep is primarily a glue language and was designed from the ground up to be embedded in Java applications. This is accomplished via a two-pronged approach. Several Sleep APIs allow extension as well as embedding of the language. By extending Sleep, developers can practically design a domain-specific language for their applications. The second prong and the primary focus of this article is Sleep's similarity to Perl. Sleep steals, borrows, and begs features from Perl. One goal of Sleep is to bring Perl's incredible text/data processing and ease of use to the Java platform.
Some of Perl's best features include built-in regular expression support, powerful data structures, and an easy-to-use I/O API. Ironically enough, Sleep provides built-in regular expressions, handy built-in data structures, and an easy to use unified I/O API. Sleep also has a few extra tricks up its sleeve. For example, Sleep can instantiate and talk to Java objects.
Regular expressions are a mini-language for describing patterns. Strings can be compared against regex pattern strings to check for a match. If there is a match, certain parts of the matching string can be extracted as described in the pattern string. Perl provides a bunch of operators for dealing with regular expressions. While Sleep does not support everything that Perl does, you'll find that the basics are there, and the operators are a little less esoteric.
I will use a phone number pattern as an example. This example will be simple, since a full-on regex tutorial is beyond the scope of this article. A phone number in the U.S. might consist of a three-digit area code wrapped in parentheses, followed by a space, followed by three digits, followed by a dash, followed by four digits. Or in short:
(ddd) ddd-dddd
Assume the ds above mean "digit." A few changes are
needed to build a regular expression string that represents the
above pattern. The string \d represents a digit in
regular expression speak. Parentheses are special characters, so to
specify them literally they have to be escaped with a
\ character, as in \( and
\). A phone number regex pattern as described above
is:
$pattern = '\(\d\d\d\) \d\d\d-\d\d\d\d';
The above pattern is great for matching "legal" phone numbers, as described. However it does no good for extracting information from any matching text.
Remember that I mentioned parentheses are special? They are used
in the pattern to identify which matching text to extract. To
designate substrings of the pattern for extraction later, simply
surround these substrings with parentheses. The extracted
substrings will then be available later trough the
matched() function.
The following snippet compares a string to the phone number pattern above. Upon finding a match, it extracts the area code and local phone number pieces.
if ("(654) 555-1212" ismatch '\((\d\d\d)\) (\d\d\d-\d\d\d\d)')
{
($areaCode, $phoneNumber) = matched();
}
In the example above, the scalar $areaCode will have
the value 654 and the scalar $phoneNumber will be equal
to 555-1212. Pretty cool, eh?
The function matched() is tied to the last use of
the ismatch predicate. The matched()
function returns an array of substrings extracted from the matching
text. Just like Perl, Sleep allows individual elements of an array
to be assigned to scalars using the syntax above.
For the sake of comparison, I present the phone number extraction example written in Perl.
if ('(654) 555-1212' =~ /\((\d\d\d)\) (\d\d\d-\d\d\d\d)/)
{
($areaCode, $phoneNumber) = ($1, $2);
}
When I was a Perl beginner, I tripped over the =~
operator. To me, it looked like an assignment being used as a
predicate. Really the above is saying "bind the pattern on the
right to the text on the left, and make the extracted substrings
available as $1, $2, etc." I used my artistic license to make the
Sleep syntax a little simpler. Hopefully Perl hackers can forgive
me.
In Perl, regular expression patterns are enclosed in forward slashes. This is more of a convention than a requirement; however, it is a convention nearly everyone follows. In Sleep, regular expression patterns are specified as double or single quoted strings.
Many of the Perl regular-expression-related goodies are
available in Sleep. The Perl functions split and
join are both available. The ever popular
s/pattern/string/g regex operation is available in
Sleep as the &replace function.
Regexes are a good place to talk about the differences between Java philosophy, Perl philosophy, and the Sleep philosophy.
|
The designers of Java designed and built a fairly simple core language. They decided that most of the features would be included in the Java class libraries. Hence, most of the complexity of Java lies within the API and not in the language itself. Perl is kind of the opposite: Perl hackers believe that many of the most commonly used features should be built directly into the language. This way, a lot of built-in syntax (AKA "sugar") can follow, to make common stuff easier to do. While this does result in a lot of power, it has also yielded a language that can be complicated.
Sleep tries to find a middle ground between these two
philosophies. Many things are built into the Sleep language.
However, oftentimes an API is relied upon to provide functionality.
The decision of where to place functionality is based on the "least
complexity" rule. If a function is easier to understand and use as a
built-in construct, then Sleep will include it as a built-in
construct. Regex matching functionality is built into the Sleep
language, as seen in the ismatch operator illustrated
above. Other regex functionality is provided with built-in functions.
Sleep aims to provide built-in power and flexibility while
maintaining a core language that is accessible to novice
scripters.
Sleep has an API for providing uniform access to sockets, processes, and files. These things are all data sources, as far as Sleep is concerned. Sleep also provides functionality similar to Perl's for manipulating byte data.
The following Perl example opens a file and reads the file contents into an array:
open(HANDLE, "myfile.txt");
@data = <HANDLE>;
close(HANDLE);
Perl provides a special syntax for dealing with file handles.
This special syntax is called the diamond operator. The diamond
operator reads either a single line or the entire contents of a
HANDLE, depending on to what context the data is assigned. If the
data is assigned to a $scalar, then a single line is
read. The entire HANDLE is read when the data is assigned to, or
used as, an @array.
Sleep does not use assignment context to define how functions behave. Come to think of it, Sleep does not have special syntax for dealing with I/O handles, either. All Sleep I/O handles are object scalars that reference an I/O stream.
The following Sleep example opens a file, reads all of the text from the file into an array, and closes the file. This example is the Sleep equivalent of the Perl example above.
$handle = openf("myfile.txt");
@data = readAll($handle);
closef($handle);
Sleep uses &openf to open a file stream. Other
functions exist for opening sockets, creating a listening socket,
and executing processes. These functions all return a scalar
variable that references an I/O stream. Any of these I/O sources
can be read from, and written to, using the same built-in
functions.
A very cool I/O concept in Sleep is callback reading. Sleep can invoke a specified function or closure whenever data is read from a source.
The following is an example of a simple echo server written in Sleep:
sub handleData
{
println($1, "Right back at ya: $2");
}
$server = listen(3000);
read($server, &handleData);
To connect to the echo server, do this:
[raffi@beardsley ~]$ telnet 127.0.0.1 3000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Hello World
Right back at ya: Hello World
A quick explanation is in order. The echo server is created on
port 3000 using the &listen function. The
&read function is used to tie the function
&handleData to the socket stream
$server. Internally, Sleep creates a thread that
references &handleData and $server.
When data is read from $server, the function
&handleData is invoked with $server and
the read data as arguments. This process continues until
$server closes.
Sleep provides functionality for dealing with binary data. This
functionality is similar in many ways to what Perl offers. In
Sleep, an array of byte data is stored as a string. Each character in
the string maps to one byte. Sleep provides readb($handle,
size) and writeb($handle, "data") to read and
write byte strings from an I/O source. For example, to copy a file
in Sleep, you'd write:
# copy.sl [original file] [new file]
$in = openf(@ARGV[0]);
$data = readb($in, lof(@ARGV[0]));
$out = openf(">" . @ARGV[1]);
writeb($out, $data);
closef($in);
closef($out);
Sleep also provides Perl-like pack('format', ...)
and unpack('format', "data") for storing and
retrieving sleep data to or from a byte data string.
The following example illustrates Sleep's binary data extraction abilities:
The wtmp file is used to record information when a user logs in or out on a UNIX system. The information in wtmp is stored as binary data. The Mac OS X wtmp manpage specifies the following C structure for a wtmp record:
#define _PATH_WTMP "/var/log/wtmp"
#define UT_NAMESIZE 8
#define UT_LINESIZE 8
#define UT_HOSTSIZE 16
struct utmp {
char ut_line[UT_LINESIZE];
char ut_name[UT_NAMESIZE];
char ut_host[UT_HOSTSIZE];
time_t ut_time;
};
Each wtmp entry consists of 36 bytes of data. These entries
contain three strings and one integer packed together. The following
example extracts the contents of the wtmp file on Mac OS X using
Sleep's &unpack function:
$handle = openf("/var/log/wtmp");
while (1)
{
($tty, $uid, $host, $ctime) = bread($handle,
'Z8 Z8 Z16 I');
if (-eof $handle) { break; }
$date = formatDate($ctime * 1000,
"EEE, d MMM yyyy HH:mm:ss Z");
println("$[10]tty $[10]uid $[20]host $date");
}
A shortened snapshot of my wtmp file is below.
ttyp3 raffi Sun, 5 Jun 2005 09:58:57 +0200
ttype raffi 192.168.1.26 Sun, 5 Jun 2005 09:59:13 +0200
Editor's note: the output has been reformatted to better suit this article's web page layout.
|
Sleep provides two data structures built into the language: the hashtable (usually referred to as just a "hash") and the ever-versatile array. Normal Sleep arrays are versatile in the fact that they can act as lists, stacks, or arrays.
To use an array and start playing with it:
@array = array("Raphael",
"Serge",
"Andreas",
"Fuzzy Puppy");
println("The last element is: " . pop(@array));
push(@array, "Mr. Anderson");
foreach $element (@array)
{
println("An element: $element");
}
In Sleep multi-dimensional arrays are easy to create. Just start indexing new dimensions:
for ($x = 1; $x <= 10; $x++)
{
for ($y = 1; $y <= 10; $y++)
{
@multiplication[$x - 1][$y - 1] = $x * $y;
}
}
# print out our multiplication table
foreach $row (@multiplication)
{
foreach $column ($row)
{
print(" $[3]column |");
}
println();
}
As a side note, the string " $[3]column |" is called
a parsed literal in Sleep. A parsed literal is a double-quoted string. Scalar variable names are evaluated inside of parsed
literals. Within parsed literals, some formatting is available. For
example, $[n]var means "append spaces to $var
until the string length is n characters". A negative value
indicates that spaces should be prepended instead. Single-quoted
strings in Sleep are simple no-frills string literals.
Hashes are another Sleep data type. The Hash interface in Sleep
is backed by nothing more than a java.util.HashMap.
All scalar keys are converted to strings prior to storage in a
hash:
%dictionary[1] = "The number one";
%dictionary["1"] = "The string one";
In terms of multi-dimensional data structures, hashes and arrays
can be mixed and matched. This is because [] is a
special operator in Sleep. It attempts to index data from whatever
expression to which it is applied. If it is applied to an expression
returning an array, it will index array data. If it is applied to a
hash, it will index hash data. This means that technically, any
expression that returns array or hash data can be indexed. For
example:
$temp = array("a", "b", "c");
println("Second element is: " . $temp[1]);
or:
println("Second element is: " . array("a", "b", "c")[1]);
Arrays in Sleep are always prefixed with an @; hashes, a %, and
scalars are always prefixed with a $. Sleep uses the symbol at the
beginning of the variable name to determine which type of data
structure to create when referencing a variable that does not
exist:
# do we want a hash or an array in this case?
$data[0] = "Hello World";
In the example above, Sleep will silently ignore the attempt to assign a value to a $scalar that doesn't reference a hash or an array.
The symbols also apply in multi-dimensional data structures. If
the symbol at the beginning of the variable name is a %, then any
time an index is applied to a nonexistent dimension, a new hash
will be created.
The nice thing about this system is that hashes and arrays are just like any other variables, with no need for special treatment. For example, to pass an array to a subroutine:
sub multiplyAll
{
foreach $temp ($1)
{
# assigning to $temp is the same as
# assigning to the individual element
$temp = $temp * $2;
}
return $1;
}
@data = array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
printAll(multiplyAll(@data, 3));
Above, @data was passed to
&multiplyAll with no special handling. Perl would
normally "flatten" @data and pass each element as a
separate parameter unless \ were used to turn
@data into a reference. In Sleep @data is
passed as a reference automatically.
Another fun thing in Sleep is the ability to instantiate and talk to Java objects. This ability was added to the language to allow access to APIs that I was too lazy to create.
The following is a simple web browser created in Sleep:
#
# Simple Sleep Based Graphical Web Browser
# Java's HTML renderer isn't very good, therefore
# this browser isn't either
#
import java.awt.*;
import javax.swing.*;
import java.net.*;
$window = [new JFrame:"Sleep Based Web Browser"];
[$window setDefaultCloseOperation:
[JFrame EXIT_ON_CLOSE]];
[$window setSize:480, 320];
sub go_to_site
{
[$display setPage: [$address getText] ];
if (checkError($check))
{
println("Error: $check");
}
}
sub link_clicked
{
if ([$1 getEventType] eq "ACTIVATED")
{
[$display setPage: [$1 getURL]];
[$address setText: [$1 getURL]];
}
}
$address = [new JTextField:20];
[$address addActionListener:&go_to_site];
$button = [new JButton:"Go!"];
[$button addActionListener:&go_to_site];
$panel = [new JPanel];
[$panel add: $address, [FlowLayout CENTER]];
[$panel add: $button, [FlowLayout RIGHT]];
[[$window getContentPane] setLayout:
[new BorderLayout]];
[[$window getContentPane] add: $panel,
[BorderLayout NORTH]];
$display = [new JEditorPane: "text/html", ""];
[$display addHyperlinkListener:&link_clicked];
[$display setEditable:0];
[[$window getContentPane] add:
[new JScrollPane: $display],
[BorderLayout CENTER]];
[$window show];
One will easily notice the calls to the Java API pretty quickly. They are all surrounded in square brackets. Sleep's syntax for using Java objects is similar to that of Objective-C:
[reference message: argument, argument, ...]
Each call has a reference, a message, and then a colon, followed by a comma-separated list of arguments.
The web browser example demonstrates that the Sleep object syntax allows one to get stuff done with the Java API. However, working with swing this way is a little cumbersome. A Sleep/Swing module is currently in the works to help make UI scripting in Sleep more practical.
In Sleep, you can't create new Java classes. However, interfaces can be faked by passing a subroutine or a closure to an argument expecting a specific interface. Closures and subroutines are actually one and the same. The topic of closures is covered next.
|
In Sleep, functions are considered "first class" types. This means that a scripter can define a new function, assign it to a variable, pass it as a value, invoke a function referenced by a variable, and so on.
To define a named closure:
sub foo
{
println("bar");
}
The named closure can be invoked as follows:
foo();
OK, that wasn't too exciting. To get technical, a named closure can also be invoked with:
[&foo];
That was confusing. It will make sense in just a minute. To assign a named closure to a variable and invoke the closure from the variable:
$var = &foo;
[$var];
Consequently, this could have been written as:
$var = { println("bar"); };
[$var];
or:
[{ println("bar"); }];
Sleep closures are called with the same syntax used with objects. Arguments in closures are available starting at $1 on up to $n for the nth argument. The message parameter (defined before the semicolon) is passed to closures as $0. This allows you to create some cool interfaces in Sleep. For example:
sub BuildStack
{
return {
this('@stack');
if ($0 eq "push")
{
push(@stack, $1);
}
if ($0 eq "pop")
{
return pop(@stack);
}
};
}
# construct a new stack closure...
$mystack = BuildStack();
# push the string "test" onto the stack
[$mystack push: "test"];
# pop the top value off of the stack and print it
println("Top value is: " . [$mystack pop]);
The example above defines a new subroutine called
&BuildStack. The subroutine returns a new closure.
Inside of the closure, the variable @stack is put into
the this scope. Inside of the this scope,
@stack is visible only inside of the owning closure
instance. A second call to &BuildStack() would
return a new closure instance with its own @stack
variable.
Closures can also be passed to Java objects expecting an interface. Any Java method call against the closure interface will result in the entire closure being executed. The message parameter ($0) will contain the name of the method Java is trying to invoke. Closures are the closest thing to objects Sleep has.
Sleep is a language for Perl hackers who also live in the world of Java. Sleep brings the power of Perl to the Java platform. Not only can Sleep extract data, parse it, rework it, and spit it back out, but Sleep can extract data from, and send it back to, Java objects. Sleep is also highly extensible, allowing new functions, operators, and constructs to be added to the language. Sleep's extensibility allows it to fit into new problem domains or be embedded into Java applications. Combine the extensibility to fit into new problem areas with powerful language features, and the possibilities are endless.
Raphael Mudge is the developer behind the scripting language Sleep and the IRC client jIRCii.
|
|