Tuesday, September 7, 2010

Best Programming Languages of 2010

(...and why you should learn them.)

Lists of the "best of..." always carry with them a sense debate.  Before we can build such a list, first we need to answer the obvious question: best in what way?

This is my list of languages you should learn because their best feature is that they make you learn.  To come out on top of this category, a language should have:

  • Strong, easily-available documentation
  • Healthy community of contributors who are open to questions
  • Language features which are powerful, but might stretch the brain of the programmer

Clojure

If you've never learned a LISP dialect, grab one and try it.  Just when it looks like LISP is about disappear in a puff of parentheses, another dialect emerges that offers even more promise than the last.  Foremost these days is Clojure, and there's little surprise why it's such a golden child.

Clojure's straightforward abstraction over sequences, and its creators' interests in abstracting other domains such as time(video) and interfaces, gives programmers new ideas to chew on that could influence their own code.

Built on top of the JVM, Clojure immediately has access to a wealth of existing libraries.  It also has access to a lot of up-and-coming concurrency primitives.  Its embrace of software transactional memory (or STM) is a double-edged sword, though, as it gives programmers a playground to use this research topic in real world applications, but it's still relatively unproven under various loads.

Still, with each version trying to be more profound than the last, Clojure is sure to give you some new tricks and some new ways of thinking.

Alternatively, if you're not interested in the JVM or concurrency, grab a copy of DrRacket or another Scheme.  You'll still get the LISP experience.  As a bonus, there are a number of books to get you started.

Factor

Once you've worked your way through thinking of programs as lists, it's time to move to stacks.  Drawing inspiration from Forth, Factor a modern stack-based language with a highly-optimized run-time and tightly-knit community of fans.

In short order, once you start using Factor, your brain will stretch.  Using the stack as an abstraction allows for seemingly impossible concision, but to be able to even read such feats, you'll be spending hours in the Listener (pictured to the left) trying out different functions (called 'words' in Factor) to see what they do and how they affect elements on the stack.

Because of Factor's simple syntax, domain specific languages are a natural fit.  In fact, some look nearly identical to textbook examples.  That's not to say Factor is without its contortions, but luckily the creators have seen fit to add syntactic sugar to help with that regard (like, for example, allowing you to name elements on the stack and refer to them later).

In short, Factor is well worth the time.  Doubly so for those interested in languages from a research point of view, as Factor is on the cutting edge of what's possible in efficiency in dynamically typed languages.

Haskell

How could I not mention Haskell?  In less than a decade Haskell has risen from relative obscurity to a name most programmers have heard, even if they haven't tried it.  Few languages have garnered this kind of name recognition.  Who hasn't heard of monads?  They're the scary 700-lbs gorilla of this whole way of programming, right?

The reason to learn Haskell is almost philosophical at this point, and that's so you won't be afraid of it.  What are the main parts of Haskell that seem to scare people?

  • Type system - It's not only statically typed, but you might say it's thoroughly typed.  On the plus side, Haskell uses it for everything from testing, error-checking, interface conformance, and even automatic test generation.  Yet it forces you to be honest about everything your function does.  Some people see this as restricting, being so open.  Other see it as liberating.  You should at least try it for a while to see what you think. 
  • Monads - The gorilla we mentioned earlier.  Luckily there are tutorials that can help tame it down to reasonable size.  Whether you like them, or not, you'll be able to see how they're used and why they might be useful in various situations.
  • Vocabulary - As if monads weren't enough, you're likely to see words like functor, category theory, and catamorphism.  The trick is to look at those as just "more gorillas" that you can work with and understand.


Each of these languages offers fertile ground to explore for weeks, months, and even years, and there's enough there to entertain you, amuse you, and possibly even teach you something new.  If you have other suggestions for languages, and why they might stretch your brain, post them in the comments.

Monday, September 6, 2010

How to talk High-Performance Computing

HPC, like most fields, has its own vocabulary.  It's close enough to general purpose programming that it almost seems like they're speaking the same language, unless you listen closer.  Here are a few tips to help you along the way:

  • The three most important parts of a supercomputer are cooling, the network, and packaging.
    • Cooling means cooling like everyone else uses it.  Without it the machine would only last a couple seconds at most.
    • The network means the network like everyone else uses it.  Here you connect a ridiculous number of endpoints in a relatively small space.
    • Packaging doesn't mean UPS and FedEx packaging.  Instead, packaging is the how the various chips, cooling, and memory are laid out on the board and "packaged" together.
  • More than one memory is memories.  Discontinguous or heterogeneous memory is also memories. It's not RAMs, though.
  • More than one source code is codes.  Benchmarks are also codes.
  • HPL, LINPACK, LAPACK, and ScaLAPACK are all benchmarks with various codes.  There are other benchmarks, but people rarely talk them.
  • Benchmark speeds are measured in FLOPs, which is floating-point operations per second.  There are gigaFLOPs and teraFLOPs.  petaFLOPs are next, which has something to do with animals.
  • Speaking of animals, supercomputers are supposed to have cool names like Jaguar and Kraken.    
  • Supercomputers can be built of nodes, but clusters are for wimps.  Instead use bladesbackplanes and cabinets.
  • HPF stands for High-Performance Fortran, but usually it's used as a swear word when you talk about software (HPF being one of the most public failures in the HPC language community).
  • On the other hand, Fortran is still cool, since it's what pretty much everyone still uses.
  • DARPA is a synonym for the mint.

Thoughts on actor-based languages

(or "Minnow, a post-mortem")

I've been fortunate to work on both Minnow(a shared-nothing actor-based language) and Chapel(concurrent global-view language). I wanted to give a few thoughts comparing the two approaches.

First, when we talk about concurrency or parallelism, without going into the pedantic definitions of each, we should answer the question "why not stay serial?"  There are a lot of reasons to do so, if we can.  Serial code is going to be easier to write, easier to debug, and easier to reason about.  Generally, going parallel is just going to complicate our lives.

I see two main reasons why going parallel would be required:
  • Reliability - Does the system have to tolerate failures of its parts?  
  • Performance - Is serial too slow for your application?
In Erlang, reliability is king.  Its shared-nothing actor-model style allows actors to fail and be restarted, which allows for more advanced features like hot code swapping.  

For areas where performance is king, C and Fortran still reign.  In high-performance scientific computing, it's common to remove abstractions and program as "close to the metal" as possible.  Though this doesn't preclude reliability, that reliability must either be done by the programmer or by the underlying architecture.  It is, as you can imagine, very fast.  

When I created Minnow, I set out to create an actor-based language that could perform as fast as C but potentially gain the benefit of reliability from the actor abstraction.  Focusing solely on single node, shared-memory machine, I was able to optimize:
  • Message sends, down to a few atomic operations
  • Number of actors possible, using light-weight threads and continuations
  • Run-time overhead, even in the presence of fairness counters for cooperative multi-tasking, by minimizing extraneous work in loops and function calls 
In short, Minnow was quite fast.  

Was it fast enough?  As it turns out, the answer is no.  Though Minnow avoiding copying where possible, it did not allow the user to see a large matrix and operate on pieces of it. Doing so would go against the notion of shared-nothing, we might even say that shared-nothing runs counter to the goal of performance.  Algorithms where we could bifurcate data into chunks, for example, wouldn't suffer, but in general we can no longer use contiguous memory in a natural way.

The modern batch of concurrent languages (Chapel, X10, Co-Array Fortran, and others) return to looking at memory in a contiguous way.  In fact, using partitioned global address space (or PGAS), the notion of contiguous memory is extended to encompass the memories of all the nodes in a cluster.  

The question of where reliability should live, and if we should push it into the language and sacrifice performance, appears to be solved issue in the high-performance community.  It remains to be seen if the larger programmer community will embrace this approach.


Many-core on a budget

On a whim, I looked into what was available on Newegg. A dual 8-core (total of 16 cores) AMD looks like they start around $1000 for both processors and motherboard combined. Then RAM and case, etc. Two years ago I bought a machine to do some of my first concurrent programming language work, before coming to college. At the time, the only options that didn't cost a fortune got you to about 8 cores, but you had to put up with strange server PCI slots.

It remains to be seen how long AMD and Intel will be able to keep adding cores until their users figure out they don't actually need them. Maybe it'll be like having a car that goes more than 85mph, and how few roads actually let you go beyond that.

Or, we'll actually begin to see killer apps for concurrency on the desktop.

Programming of a different sort

Not sure how I missed this earlier in the summer, but it appears that J. Craig Venter's team was successfully able to reprogram and reboot a cell which was capable of self-replication. Even though I'm not a biologist (is IANAB even a term?), I thought the press conference was understandable and worth the watch.

Saturday, September 4, 2010

More DrRacket

When you first open DrRacket, you get a screen like the one on the left.

The top half is a text editor. The bottom half is the 'interactions' window, or REPL, which allows you to write Racket code and test it immediately.

Some useful interactions shortcuts:

esc,p or ctrl-up: previous command in history
esc,n or ctrl-down: next command in history

Now the official home

After losing my old hosting service, I decided to simplify and just have a simple blog again. Mmmm, simple blog.

Let us hope it will be considerably more comfortable than a simple bog.