Dynamic Languages and Java/Dynamic Languages Grow Up
From JVMLanguages
| Table of contents |
Introduction
The era of statically compiled languages is coming to an end. With computer processing power doubling every few months, interpreted languages like Perl, Python, and Ruby are becoming viable alternatives for professional software development in almost every domain. Java and .NET have transformed the landscape of the traditional development model by providing incremental class loading, platform- and language-neutral bytecode formats, and very powerful and comprehensive runtime libraries.
And with an increased use of less traditional programming languages comes a demand for an even wider variety of languages. This trend can be seen across the entire software development industry; languages are becoming increasingly specialized. Dynamic languages have always been popular for traditional server-side web development. However custom languages, created solely for web development, have finally begun to mature beyond their roots as simple template languages. The syntax of both PHP (in PHP5) and JSP (with JSTL) has become increasingly complex and expressive in recent history. On the client-side, the Ajax paradigm (popularized by Google's rich web applications) pushes the need for complex and tightly integrated JavaScript code. In other areas, declarative programming languages -- in particular, rule-based languages -- are starting to find their niche in the traditional software development world. Secure languages like E have appeared to fix the problems inherent in the sandboxing and code-signing security models adopted by Java and .NET.
Ordinarily, this unbounded proliferation of programming languages would lead to a compatibility nightmare. However, this trend has been accompanied by a parallel but opposite one: the convergence of runtime environments. There are now just a few key competitors in this area and languages are gradually being reimplemented in terms of these new cross-platform, cross-language virtual machines.
Specialization of Programming Languages
Software developers today have more tools in their arsenal than at any other time in the history of the discipline. It seems like every day a programming language emerges that represents some new hybrid of existing language features, and often a few novel ones as well. This kind of exploration is important for a number of reasons. Elegant ways of representing algorithms are still being discovered. Every programming language must make compromises, and trial and error is the best way to discover how to make better languages in the future. Eventually when the entire space of possible language features has been explored, we will converge towards the most effective combination of techniques and our journey will be complete.
But from a pragmatic programmer's point of view, this presents you with a bit of a problem. Every time you need to write a new piece of code, you also have to choose a programming language to write it in -- and there are thousands to choose from! Often, due either to time constraints or risk aversion, the programming language chosen happens to be the one that the development team has had the most experience with, or has the most vocal supporters. However, with well-rounded developers who have been around the block a few times, learning a new programming language is unlikely to be a significant bottleneck in a large project. Any long-term maintenance or functional benefits would far outweigh this cost.
Familiarity should not weigh heavily into the decision to use a new programming environment, but there is one factor that absolutely must be taken into account: compatibility. You need to carefully consider the effect that using a new language or environment will have on the ability for this component to integrate well with the overall project. If you're a new developer hired on to a team that's supporting a complex system comprised of 10 C++ libraries and you decide to write your new component in LISP, you're likely to get some strange looks from your team lead.
Using the right tool for the job
Let's assume that for every problem there is a set of language features that is ideal to solve that problem. For example, if a solution to the problem needs to be developed and tested very quickly, a dynamic language like Python may be advantageous. If the problem can be best modelled mathematically as a recursive function, then perhaps a symbolic language that optimizes the use of tail-recursion, such as Scheme, would be desired. If the problem required tight integration with an existing component written in Delphi or VB, then writing your component in the same language may be in order. If the problem had already been solved elegantly in a legacy system, perhaps in Fortran, then reusing that code might be best. You get the idea.
For example, if you're prototyping a system that will require a complex object model, then Ruby may be an ideal choice. In Java, you are likely to spend a lot of time creating new files, adding fields, writing getXXX() and setXXX() methods, defining equals() and hashCode() methods, etc. Much of this "filler code" can be auto-generated, but it still gets in the way when you're trying to step back and get a "big picture" view of your architecture. On the other hand, Ruby's syntax is very concise and expressive, and Ruby's object model is dynamic and lightweight. Ruby is also a purely object-oriented language, so there is no primitive vs. object distinction to keep in mind. Even null is an object in Ruby, so you can call methods like equals? and hash (Ruby's equivalent of equals() and hashCode()) like you would on any other object. Ruby makes it trivial to perform reflection and supports other features like mixins, which allow you to efficiently implement some of this "filler code" once, and have it apply to all of your objects.
For example, take multiple inheritence. The creators of Java purposely did not support multiple inheritence, because of the types of ambiguity problems that it can lead to. Instead Java provides interfaces, which allows your classes to derive from multiple types without sharing any implementation code. Ruby doesn't support full multiple inheritence either, but it has a feature called mixins, which takes the idea of interfaces to the next level. Mixins allow you to inherit implementation code from another class (or module, as Ruby calls them) without any of the ambiguity problems of traditional multiple-inheritence. This allows you to create base classes that implement core methods like equals() and hashCode() without imposing any restrictions on what else your classes can inherit. To illustrate this, I'll create a module called EqualsAndHashCode, which simply gives a class the ability to compare two instances by value, and to calculate a hash code for each instance. The full code for this module is given here:
module EqualsAndHashCode
def ==(obj) { test_equal(obj) }
def ===(obj) { test_equal(obj) }
def eql?(obj) { test_equal(obj) }
def hash
h = 37
self.instance_variables.each do |var|
v = self.instance_eval var
h = 17 * h + v.hash unless v.nil?
end
return h
end
private
def test_equal(obj)
return false unless self.class == obj.class
(self.instance_variables + obj.instance_variables).uniq.each do |var|
v1 = self.instance_eval var
v2 = obj.instance_eval var
return false unless v1 == v2
end
return true
end
end
This module can be mixed-into your data container classes simply by using the include statement. It's even possible to mix this module into a specific object, rather than an entire class! Ruby also provides simple attr* statements for defining fields along with accessor and mutator methods.
This Ruby class makes use of the EqualsAndHashCode module, and provides get and set methods using the attr* syntax:
class Car
include EqualsAndHashCode
# generates make(), model(), year() methods
attr_reader :make, :model:, :year
# generates color(), color=(), owner(), owner=() methods
attr :color, :owner
def initialize(make, model, year)
@make = make
@model = model
@year = year
end
end
c1 = new Car("vw", "jetta", 2001)
c2 = new Car("vw", "jetta", 2001)
# Ruby uses color=() as the convention for set methods.
c1.color=("blue")
# This is a short-cut for calling color=(). It does *not*
# set the field directly, as it would in Java.
c2.color = "red"
# == has been overriden in EqualsAndHashCode to do equality checking
print "different" unless c1 == c2
Now consider what this same class would look like in Java. In addition to defining each field, you'd also need to create getXXX() methods for each of the five attributes, and setXXX() methods for the two mutable attributes. It's possible to avoid explicitly writing equals() and hashCode() methods using reflection just as we did with Ruby (in fact, the Jakarta commons-lang project provides similar functionality), but you'd still need to implement equals() and hashCode() methods that delegate to the reflection-based versions of these methods (presumably in a utility class, probably adding an import statement to the list of extra statements you'd need).
These are just a few of the reasons that Ruby is superior to Java for prototyping or rapid development. Dynamic typing, less restrictive scoping rules, and a richer set of string and set manipulation features also contribute to an environment that can vastly reduce development time and costs in certain situations. Obviously there are many times that Java is preferable. For mature projects, a statically typed language with compile-time checking has many benefits. But for rapid prototyping, experimenting with new algorithms, or very short-term projects, Ruby is almost perfect. But why should we have to choose between two languages that offer such clear and complementary benefits?
Integration is not always free, but it can be cheap
Now, imagine for a moment that the cost to integrate programming languages was zero. What if you could write a base system in Java, but still write each individual component in the language that best fit the problem that component was trying to solve? This system could still be integrated into a Java-specific Application Server, or deployed to end-users via Java Web Start or as an applet, or it could run in a managed Grid Computing environment. It would not, however, be limited by the constraints of the Java programming language.
Just look at the sheer number of open source projects available right now. The variety of projects is a powerful indication that the software development world as a whole is evolving. But think of how much effort is being wasted with straight ports of the same system from one language or technology to another. No offense indended to the authors of JUnit, NUnit, PUnit, cppUnit, et al., but there are only so many ways to implement a unit-testing framework. If we could transparently integrate components across programming languages, there would be less need for this. Diversity among programming languages is a powerful thing, but when it ends up segmenting the global community of developers into groups that are forced to spend time keeping their frameworks and toolsets in lock step, that diversity can be a liability as well.
The Rise of Virtual Machines
But just because we have a large number of diverse technologies does not mean that we need to live in a world of incompatibility and duplication of effort. There are a number of techniques that can be used to allow these technologies to communicate transparently and efficiently. In particular, the simplest way of allowing two or more programming languages to integrate seamlessly is for those languages to share a common runtime environment.
Despite the fact that programming languages have continued to branch out from their common ancestors and are becoming progressively more specialized, the runtime environments for those languages are converging towards a small number of competitors. The three most significant competititors are Sun's Java, Microsoft's .NET, and the open-source Parrot project.
Each of these runtime environnments is composed of several components:
- an intermediate compiled representation (i.e., a bytecode language),
- a virtual machine, composed of either an interpreter and/or native compiler for that intermediate language,
- a set of runtime classes implemented in that intermediate language,
- a compiler for the "flagship" source language that supports most of the features of the underlying virtual machine,
- and compilers for one or more other (legacy) source languages.
Java
Java began in 1992 as "Oak", a programming language designed specifically to implement graphical applications for Sun Microsystem's experimental set-top home entertainment system. As the popularity of the World Wide Web exploded over the next few years, the focus of Oak shifted away from interactive television and towards web browsing on personal computers. It was renamed Java, and went public in 1995. It was soon integrated into the Netscape web browser, where Java applets could be used to create dynamic, graphical applications with quick response times. Applets were popular for several years, but along the way developers began to recognize the other benefits of a language that could be compiled once and run on any architecture or platform supported by Java.
In the days before Java, software developers had to decide on which operating systems they wanted their software to efficiently run. This decision placed many constraints on the languages and technologies that could be used. One of the primary goals of Java was to eliminate this need for architecture-specific code by providing a platform-neutral wrapper around the operating system and underlying hardware architecture.
There is a trade-off here, of course. Because of this, the Java Virtual Machine has never been as tightly integrated into each of the operating systems that it runs on as some people would have liked. Java tends to support only the features that are present and similar among most of the platforms that it supports. However, despite these drawbacks, Java has been extremely successful in its original goal to provide a "write once, run anywhere" environment for developers.
.NET
Although the engineers at Sun were heavily influenced by pre-existing languages like C++, Java made no real attempts to integrate well with existing languages. It seems that Microsoft understood this limitation and soon made their own contribution. In 2000, Microsoft released the C# programming language, which was based in large part on Java and was designed to compete directly with it. However, along with C# came the .NET framework, which consisted of their own intermediate bytecode format called the Common Intermediate Language (CIL, or MSIL), and their own virtual machine called the Common Language Runtime (CLR). Rather than focusing on being "cross-platform" as Java did, they instead focused on "language interoperability." Integration with legacy code -- particularly code implemented in existing Microsoft languages such as Visual Basic and C++ -- continues to be one of the key marketing forces driving the push for .NET.
Parrot
Parrot is a more recent contender. It began in 2001 as an April Fools joke, but quickly became a legitimate idea with a noble goal -- to unite the camps of Perl, Python, and Ruby developers onto a single runtime environment that could compete with Java and .NET.
Each runtime environment has its own set of properties; understanding these is key to choosing the right runtime environment for your project:
| Java | .NET | Parrot | |
| Compiled Representation | Java Bytecode | Common Intermediate Language | Parrot Bytecode |
| Virtual Machine | Bytecode Interpreter | Common Language Runtime | Parrot Interpreter |
| Runtime Library | rt.jar | .NET Framework | Parrot Library |
| "Flagship" Language | Java | C# | Perl 6 |
| Alternate Languages | BeanShell, Groovy, Nice, Scheme, Python, Perl | VB, C++ | Perl 5, Python, Ruby |
Language Interoperability
Even though .NET and Parrot have "language independence" as one of their primary goals, there is very little about them that makes them more suited to this task than Java. Clearly it is possible to implement interpreters in Java as well as to write compilers that generate Java bytecode based on source code from other languages. This book will highlight dozens of such projects that, if marketed properly, could easily rival the features of .NET and Parrot.
Getting Inside Your Programs
Enough with the theory -- let's get practical. Imagine you're a developer working on a client-side Java application that is deployed to a few thousand users across a global corporate intranet. This application's configuration is complex, and it maintains connection pools to a wide variety of remote services. It is your job to work with your support team to tweak the overall performance of the system in response to network latency and bandwidth constraints. To do this, you will need to start and stop services, monitor and adjust connection pool statistics, tweak configurations individually for each user, and perform other minor tasks behind the scenes. Oh, and by the way, you only have a few days to put something in place before the final build happens. What are you going to do?
Well, one way to approach this problem is to have all of your applications connect to a centralized service and post statistics as well as listening for requests to perform some action. However, you're not sure exactly what tasks you'll want the application to perform, or what information you may want to extract in order to make this decision later. You can't possibly enumerate every possible part of the system that may need to be analyzed and tweaked ahead of time. Even if you did, you wouldn't want to take the time to write actual Java code to do this, if only a very small percentage of it would ever be used. In addition, it's likely that every bit of support code that you do write will make the application more brittle and difficult to modify in the future. The basic problem that you've encountered here is that while the vast majority of your application must be maintainable, which requires well-architected and encapsulated code, and rock-solid, which requires efficient and resiliant code, this part of the application doesn't need to be any of those things. If you were adding new business features to the application, some design time and added complexity would be a small price to pay for the improved flexibility of each component and enhanced functionality. However, in this case you don't want to increase the complexity of the application, and you don't really need a well-architected, efficient solution. You simply need a backdoor, where you can easily peek into the application and maybe twiddle a few bits.
So, what's the best way to accomplish this? I would argue that Java is not the right tool for the job. By leveraging the strengths of other, more dynamic programming languages, this problem can be solved quickly and efficiently. What tools do we already have available that can be used to solve this problem? First, we need a communication mechanism. JMX might be a good tool to use here, or perhaps even SNMP, but neither of these will help us get this done quickly. You could create your own solution using RMI, SOAP, or even pure sockets, but then you need to create a client application as well. You'll need a way to address the client applications either individually or in groups (perhaps divided geographically, or by release number). The best tool that I can think of for this, believe it or not, is IRC. Off-the-shelf IRC servers and clients are readily available -- there's a good chance that your organization uses both already -- and an IRC interface is relatively easy to build into your application.
There are various Java components that can act as IRC clients that we could use, but this only addresses a very small portion of our requirements (the communication protocol). You also need some way to allow the support team to poke around inside of the application. In theory, you could build a parser for a query language and use Java reflection to retrieve data, but this is starting to sound like an incredible amount of work. But there are already programming languages that can be parsed and interpreted at run-time, and many of them can interface with existing Java code using reflection.
In fact, there is one language in particular which:
- has a very powerful and relatively standard IRC client component,
- can be parsed and executed dynamically, and
- is likely to be used by the support team already.
Care to guess which? Take a look at this:
use Net::IRC;
my $number = 1;
my $irc = new Net::IRC;
my $conn = $irc->newconn(Nick => "bot0001", Server => "localhost") or die;
$conn->add_handler([376,422], sub { shift->join("#botnet") });
$conn->add_handler(433, sub { shift->nick(sprintf("bot%04d", ++$number)) });
$conn->add_handler('public', sub {
my ($self, $event) = @_;
unless ($event->nick =~ /bot/) {
my $res = eval(($event->args)[0]);
$self->privmsg($event->to, $@ || $res);
}
});
$conn->add_handler('msg', sub {
my ($self, $event) = @_;
my $res = eval(($event->args)[0]);
$self->privmsg($event->nick, $@ || $res);
});
$irc->start;
With just that much code you could have a fully functional IRC bot that can respond to interactive requests (in the form of arbitrary Perl expressions). If you were to run this as a standalone Perl script, you'd have a fun little toy that could perhaps be used as a calculator. However, instead I'm going to do something far more interesting. Remember that Java application that you were trying to support? I'm going to embed this script within it using something called the Bean Scripting Framework. Now we are not limited to simply evaluating arbitrary Perl expressions, but we can actually reference Java classes and invoke Java methods.
By integrating the code in Example 3 with Java, we can take the ordinary Java classes that already exist in our application, and know nothing about IRC or user interfaces, such as:
package com.oreilly.javalangint.diagbot;
public class ConnectionPool
{
public int getNumOpenConnections() { ... }
public int getMaxConnections() { ... }
public void flushConnections() { ... }
}
and interact with them directly in real-time through IRC:
#botnet> $pool = $bsf->lookupBean("ConnectionPool")
<bot0001:#botnet> com.oreilly.javalangint.diagbot.ConnectionPool@a981ca
<bot0002:#botnet> com.oreilly.javalangint.diagbot.ConnectionPool@a981ca
<bot0003:#botnet> com.oreilly.javalangint.diagbot.ConnectionPool@a981ca
<bot0004:#botnet> com.oreilly.javalangint.diagbot.ConnectionPool@a981ca
#botnet> sprintf("%d/%d", $pool->getNumOpenConnections, $pool->getMaxConnections)
<bot0001:#botnet> 6/512
<bot0002:#botnet> 12/512
<bot0003:#botnet> 510/512
<bot0004:#botnet> 8/512
->bot0003> $pool->flushConnections
->bot0003> sprintf("%d/%d", $pool->getNumOpenConnections, $pool->getMaxConnections)
*bot0003* 0/512
You can even define variables and functions that can simplify repeated tasks:
#botnet> sub percent { sprintf("%.2f%% used", 100 * $pool->getNumOpenConnections / $pool->getMaxConnections) }
#botnet> percent()
<bot0001:#botnet> 1.17% used
<bot0002:#botnet> 2.34% used
<bot0003:#botnet> 99.61% used
<bot0004:#botnet> 1.56% used
Or you can schedule periodic events to notify the support staff:
#botnet> $conn->schedule(60, sub { shift->privmsg("#botnet", "warning: " . percent()) if ($pool->getNumOpenConnections > 100) })
... 60 seconds later...
<bot0003:#botnet> warning: 99.61% used
... 60 seconds later...
<bot0003:#botnet> warning: 98.63% used
...
I know what you're thinking -- "No fair, you cheated! That's Perl." Yes, it is. Perl has a simple, well-written, standard IRC module. Perl can parse and evaluate dynamic content efficiently. And as you'll see in later chapters, integrating Perl into Java in this way is almost trivial. Is this solution particularly secure? No, but it could be made so. Is it going to win you awards for your fantastic architecture? Maybe not. But it's fast, it's easy, and it's amazingly powerful. This is what you'll be doing throughout the rest of this book -- learning to leverage the strengths of other programming languages without giving up the knowledge, experience, and code that you rely on right now.
Continue on to Chapter 2: Programming Language Safari.

