Sunday, August 2, 2015

RESTful RDP with big values

What if you want to use a big value, like a whole database table or weblog, as a Reactive Demand Programming signal value? This would make it possible to use RDP to orchestrate things like incremental MapReduce pipelines. Here's one weird trick to make it work.

In effect, each RDP signal becomes a RESTful server, speaking an HTTP-like protocol. Clients of a signal remember the ETag of the last version of the signal they've processed, and send it to the server on subsequent requests.

The protocol to retrieve the value of a signal may work like this:
  • If a client hasn't seen a previous version of the signal, it sends no ETag. The server replies with the complete value. The value may be split up into multiple pages, using something like an Atom paged feed. (Other non-sequential kinds of splits are possible: for example, tree-like data like a filesystem can be fetched in parallel using hierarchical splits.)
  • If a client has seen and processed a previous version of the signal it sends the stored ETag. There are three possibilities:
    • The content hasn't changed (i.e. the server's ETag matches the client-sent ETag), so the server replies with a status code that indicates that there is no new content (HTTP's 304 Not Modified).
    • The content has changed, so the server replies with a diff of the client's version versus the server's version.
    • The content has changed, but the server is unable to provide a diff against the client's version. This can happen for servers that do not keep a complete history, or also if the server determines that it's more efficient to have the client retrieve the whole value again instead of sending a diff. The client has to re-fetch the whole signal value, as if it had seen no previous version of the value.
I haven't worked out all the details, but I think this scheme could be made to work.

Monday, July 20, 2015

What I learned about Urbit so far

[Updated, see comment and this Reddit AMA with the Urbit team, that clarifies a lot of issues.]

Urbit is some kind of new operating system design thingy, that is kinda hard to categorize.

Some interesting design points are:
  • Urbit restricts the number of identities in the system to 232. This means Urbit doesn't have enough identities even for currently living humans. In line with the usual obfuscation going on in Urbit, such an identity is called destroyer.
  • Urbit is programmed in a weird programming language called Hoon. Hoon's (only) definition is a 10KLOC file hoon.hoon, written in itself. It uses practically random variable names (such as nail for "parse input", or vase for a "type-value pair"), not to speak of the "nearly 100 ASCII digraph 'runes'". The author acknowledges that the parser is "very intricate".
  • Efficiency-wise, Hoon is impractical as a programming language, so in the real world, the VM will recognize Hoon fragments, and replace them with C code (called jets).
This brings us to the question: why would anybody actually design an OS like that? The best explanation I've seen so far is by Kragen, who writes:

Monday, June 15, 2015

A trivial implementation of Reactive Demand Programming

I wrote a trivial implementation of RDP in JavaScript to help me understand how it works.

It's called bucky-rdp (about 200 lines of heavily commented code).

It currently supports Sirea's bconst, bpipe, and bfmap.

Here's an example:

// Create some behaviors for transforming numbers.
var bDouble = rdp.bFMap(function(val) { return val * 2; });
var bMinusOne = rdp.bFMap(function(val) { return val - 1; });

// Create a pipeline behavior of the two behaviors
var myBehavior = rdp.bPipe(bDouble, bMinusOne);

// Apply an inactive input signal to the pipeline behavior
var sigIn = rdp.makeSignal();
var sigOut = rdp.apply(myBehavior, sigIn);

// Change the input signal value and watch the output signal change
console.log(sigOut.getValue()); // Prints 3
console.log(sigOut.getValue()); // Prints 7

(This post refers to v1.0.1)

Tuesday, June 9, 2015

Grokking Reactive Demand Programming

TL;DR: RDP is an exciting declarative model of how computational processes (behaviors) are connected by continuously updating values (signals) to effect changes on storage and external state (resources).

I've come a bit closer to understanding David Barbour's Reactive Demand Programming model, and this has confirmed my previous hunch that RDP is one of the most interesting systems designs since Unix. If you're looking for new, better ways to structure dynamic, interactive applications, I strongly recommend checking out RDP.

I would call RDP an orchestration model, since it cares about how you connect and assemble components of your app, and gives you a lot of freedom in what these components do and how they interact. This also fits with David's description of an RDP application as "a complex symphony of signals and declarative effects orchestrated in space and time".

In terms of Unix, RDP's behaviors correspond to processes, signals correspond to pipes, and resources correspond to storage and other external, stateful things.

Signals (pipes, channels)

A signal continuously delivers a potentially changing value. The current implementation always updates the complete value, but RDP doesn't rule out diff/patch-based signal updates, to model e.g. a large set as a signal value.

In addition to these simple signals carrying a single value, there are also compound signals, such as (x :&: y) which represents the concurrent, asynchronous product of signals x and y, IOW a signal representing two independently updating signals. Analogously, (x :|: y) represents a disjoint sum of signals, with either x or y being active at any given point in time.

A signal is either active (carrying a value), or inactive (disrupted). Application-level errors have to be modelled as part of the value, there is no "stderr".

Behaviors (processes, computation)

A behavior won't do anything until you place a demand on it. You place a demand on a behavior by applying an input signal (the demand) to it; the behavior will produce an output signal for the duration of this application.

Multiple demands can be placed on a behavior at the same time. The behavior can either reply to each input signal with a different output signal, or with the same output signal, depending on the purpose of the behavior. For example, a "calculator" behavior may take N input signals with expressions like "1 + 2" and "3 * 5" and deliver a distinct output for each input; on the other hand, a "sum" behavior may take N input signals carrying a number and produce a total sum as the output signal, which would be the same for all inputs.

Behaviors can be composed into dataflow networks. A simple composition is the pipeline behavior, b1 >>> b2: the input signal of this pipeline behavior will be processed by the behavior b1; b1's output signal becomes the input signal for behavior b2; and finally, b2's output signal becomes the output of the whole pipeline behavior.

The Sirea Haskell implementation of RDP comes with other behaviors such as bdup, that copies a single signal into both branches of a :&: product, for creating more complex networks of signals and behaviors. There are also primitives like bfirst and bsecond for running different behaviors against the branches of product signals, and bfmap for applying ordinary Haskell functions to signals. (See Arrows.)

Resources (storage, external state)

RDP doesn't say anything about state, so it has to come from the outside. Access to stateful resources such as filesystems is abstracted through behaviors: to access a filesystem you use a behavior like readFile "foo.txt" that continuously delivers the contents of the file "foo.txt" as output signal.

Creating new resources in RDP is impossible, so resource discovery idioms are used: for example, to "create" a file, you use a UUID as its name, and it will be automatically created the first time you write to it.

I hope this has been helpful. For further reading, check out the extensive README of the Sirea RDP implementation, and David Barbour's blog.