Compiling in parallel and deferred threads

There is a lot of parallelism in compilers. I thread tokenizing, parsing and some other stuff. I tried compiling each function in a thread (ie one thread per function) but that overwhelmed my two core system and was slower than a single thread. My next attempt at parallelizing the compile phase was to split the convert-the-parse-tree-to-asm-code and convert-asm-code-to-VM-instructions stages. Functions are the only objects that actually contains runnable code so it was pretty easy to do the split. Another observation is that there isn’t always code that needs assembling so it would be nice to avoid creating a thread in that case (threads aren’t cheap).

Performance: This was good for about a 15% speed up (AMD dual core). The asm thread appears to be easily able to keep up with compiler and probably should be given more work to do.

The first order of business is to create a thread class that will assemble code. And only create that thread if something needs to be assembled. The thread works by sucking in functions (plus some supporting info) to assemble from a inter-thread pipe, calling the assembler and writing the results to another pipe (so they can be consumed by another thread and squished into the object being compiled):

class FcnAsmThread {
   fcn init {     // –> Deferred Thread/Pipe, Pipe
      var [const] in = Thread.Pipe(),
                  out = Thread.Pipe();
        // Only start thread if actually need it
      returnClass(
         T(Deferred(fcn{self.launch(); in}),out));
   }
          // asm a fcn and write it to out in a thread
   fcn liftoff {
      foreach pFcn,vmClass,cblock,defaultArgs,index in
      (in) {
         reg names =
            T(pFcn.name,pFcn.argNames.xplode());
              // .finish() calls Asm.asm()
         func = self.fcn.embryo(names,vmClass,
            cblock.finish(),defaultArgs,
            pFcn.isPrivate);
         // Classes are not thread safe,
         // add fcn in other thread
         out.write(T(vmClass,func,index));
      }
      out.close();
   }
}

Crib sheet:

  • Creating a new instance of this class doesn’t return a reference to itself (self), it returns two pipes. Thus, the new class is “cloaked” and the only way to access to it is via the pipes. returnClass is used to override the normal return mechanism.
  • The input pipe isn’t really a pipe but a Deferred object, which, when first referenced (as in in.write()), causes the contained function to run, which, in this case, launches the thread and returns the pipe, and the original operation (write) is done. Subsequent references will get the pipe.
  • T() creates a read only list.
  • A call to launch() creates the thread and the new thread calls the liftoff function when it starts running.
  • fcn.embryo() creates an “embryonic” function that is ready to be inserted into a Class.
  • A foreach loop on a pipe reads from the pipe until it is closed.
  • Exiting liftoff stops the thread.

________________________________

The thread is readied at the start of a file (or chunk of text) compilation. This is a bit awkward as the compiler is written as reentrant state machine so there needs to be some deduction as to when to fire off the thread. Since a file is encapsulated as a class (a “RootClass”), do it when one (and there can be only one) of those is encountered. If a source code error is encountered, the thread needs to be shut down but care needs to be taken so that, if the thread hasn’t started, attempted to stop it doesn’t start it.

fcn compileClass {
   …
   reg conBlock,in,out;
   if (pClass.isInstanceOf(Parser.RootClass)) {
      // Create a thread to asm the compiled fcns
      // in is a Deferred Thread/Pipe, out is a Pipe
      in,out = FcnAsmThread(); pClass.asmPipe = in;
      onExit(fcn(in,out){
// all done compiling this file
         if (in.BaseClass.toBool()) { // thread was started
                   // all done compiling this file
           
if (vm.xxception) in.clear();
            else { 
// add fcns to their Class
              
in.close(); FcnAsmThread.mashUp(out); }
         }
         // else thread wasn’t started, do nothing
         return(Void);
      },in,out); // onExit
      conBlock = compileBlock(classBlock,vmClass); }
   }
   …
}

Crib sheet:

  • All changes and accesses to the Deferred are made in the compiler thread;  a good thing as Deferred objects are not thread safe.
  • A “block” contains the source for a class; if a class (such as the RootClass/file) contains classes, compileClass is recursive.
  • Asking a Deferred to convert itself to a Bool (eg if (Deferred) … ) is a reference to the underling object (the pipe). We don’t want that. Using .BaseClass.toBool() prevents evaluation and directs the question to the container. The result is True if the Deferred has been evaluated (ie has the thread been created?). I need this check because if I just call in.close(), and the thread hasn’t started, it would start.
  • Closing the input pipe stops the thread (since all it does is read from the pipe).
  • Once the thread has stopped, out contains all the assembled functions.
  • The onExit keyword creates a deferred object (in this case, a function) that is evaluated when the function exits/returns. Using onExit means this code will always be run, I don’t have to worry about exceptions or whatever and puts the thread finish up code close to where the thread is created.
    If an error caused the compilation to stop, clear and close the pipe, which will stop the thread.
    If the thread was started, functions were assembled so close the pipe, stop the thread and put the functions into their class.
    No matter what happens, I know the thread will be stopped.
    Returning Void ensures there in won’t be mistakenly evaluated.

______________________________________

The compileFcn function does a bunch of stuff then sends the function code out to be assembled.

fcn compileFcn(pFcn,lexBlock,vmClass) {
   …
   reg rootClass = lexBlock.findRootClass();
   rootClass.asmPipe.write(  // pass to thread
       T(pFcn,vmClass,cblock,defaultArgs,
         lexBlock.fcnIndex(fcnName)));

   …
}

Crib sheet:

  • The RootClass contains the pipe to the thread. As the compiler is a state machine, the RootClass is located in the passed in context (parse tree).
  • If the thread hasn’t started yet, calling write will cause the Deferred object to evaluate and start the thread and then perform the actual write.
Posted in Uncategorized | Leave a comment

Chained Comparisons

A + B + C is something I’m calling a “chained” operation: a sequence of binary operators of equal precedence. Standard stuff. So why aren’t comparison operators chainable? Probably because it isn’t used much. I looked though thousands of lines of code and found two places where it would be useful, one in the tokenizer:
   else if (0xD800 <= u <= 0xDFFF)
      // error, illegal Unicode character

and the other in a script to print a character (or “.” if the character isn’t “printable”):
   bytes.reduce(fcn(d,c){
     
d.append(if (0x20 <= c <= 0x7E) c.toChar() else “.” )},
      d.clear() )
Pretty simple stuff. Another reason could be that the parsing gets kinda ugly after three operands. So, to give myself brownie points for implementing it and to keep things simple and sane, I set the following conditions:

  1. One or two operators: “A O1 B” or “A O1 B O2 C”, where On is a comparison operator (==, !=, <. <=, >, >=).
  2. If there are two operators, the operators must be of equal precedence. This means you can group ==, != or <, <=, >, >=.
    ”A==B<C” is just painful to parse and if it isn’t used, why bother? “(A==B)<C” doesn’t work because of 5.
  3. Logic operators “and” and “or” terminate a chain. Don’t break “A==B and C==D” but parse “(A==B==C) and D==E”. Those ()s are required because I don’t want to chase the expression tree.
  4. “A O1 B O2 C” –> “(A O1 B) and (B O2 C)” where B is evaluated only once.
  5. ()s scope, just because that is sane. It isn’t very useful as “(A O1 B O2 C) O3 D” is “(BOOL) O3 D”, so D has to be a boolean for the expression to be meaningful.

4 is the stinker. You basically have to rewrite the expression and use a stack or temporary. If the operands are complicated, you could have a real mess on your hands. 1 is a counter, 2 is a check, 3 is just a write barrier and 5 is a stack.

I compile these expressions in two steps:

  1. Convert the expression to RPN, inserting special tokens to save/restore a result and do the “and”.
    ”A O1 B O2 C” –> “A, B, saveB, O1, restoreB, C, O2, and+”
  2. Compile the RPN.
    Figure out how to save/restore. And and+ has to clean up if the result is False (because the entire expression is False so we can shortcut and are done). This is why operators are limited to two; additional terms have to be saved somewhere and I’d rather not do the stack foo or temp management.

Now I can do stuff like a()==try{b()}catch{c()}==d(). But I don’t, sigh.

Posted in Uncategorized | Leave a comment

Infinite sequences and lazy evaluation

I’ve been trying to wrap my brain around functional programming, without much success unfortunately. I really like the ideas but I seem to think imperatively. So I’m trying to take little bites and see if I can express them in language I can grok. Some of the ideas are straight forward, like map, reduce, etc. A really difficult one for me is infinite sequences and lazy evaluation. An infinite sequence seems like a really simple concept: a function that, when called, produces the next value in a sequence or just continually pumps out values. Such as a thread that writes random numbers to a pipe or a function that calculates Fibonacci numbers:

fcn fibSeq1 {
   var a = 1, b = 1;   // initial values
   reg t = a;
   a = b; b += t;
   return(t);
}
foreach i in (20) { print(fibSeq1(),”,”); } –> 1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765,

Lazy evaluation is a really cool concept: if you don’t have to compute, don’t; just remember what you need to do if somebody calls your bluff. For me, (ignoring && and ||), the classic example is deferred module loading:

// Create a placeholder but don’t actually do anything
var Date = Deferred(Import,“Time.Date”);
<bunch of stuff, maybe leaving this control flow>
   // import module and call function in that module
println(Date.prettyDay());
–> Tuesday, the 10th of August 2010

OK, that was easy to digest, how about infinitely recursive sequences? Again, Fibonacci:

fcn fibSeq2(a,b) { return(a,self.fcn(b,a+b)) }
fibSeq2(1,1) –> (1,2), (2,3), (3,5), (5,8), …

Given any two consecutive Fibonacci numbers, this gets into some serious navel gazing calculating the rest of the infinite sequence. Until the stack blows anyway. My question is/was, how can this be useful? All it does is produce a core dump. The Haskell people say this is indeed very useful. How so? Well, with lazy evaluation, you can control the recursion and get it to stop when you have calculated the desired result. Check out this Haskell lazy evaluation example (now that I kinda know what is happening, it is lucid, but that took quite a bit of time). My “quest” was to transpose that example into my iterative world view. The question is: given an infinitely recursive sequence, how I can get the nth item in that sequence? The sequence is represented as
    fibSeq3(1,1);
which is slightly tweaked fibSeq2 as I don’t have support for implicit lazy evaluation. To get the third fib:
    nthFib(fibSeq3(1,1),3);
Thus the first veil presents itself: how can I pass a call to fibSeq3 without getting that infinite recursion? Just defer it:
    nthFib( Deferred(fibSeq3,1,1) ,3);
OK, that gives me a place holder to the start of an infinite sequence. Since fibSeq3 returns a pair, where the first item is the nth fib, if the count reaches 1, we are done and the answer is that first item. Otherwise, decrement the count and get the next fib pair:

fcn nthFib(seq,n){
   if (n <= 1)          
// done, stop recursion
     
return(seq[0]);
   return(self.fcn(seq[1],n-1));  // tail recursive
}

How to stop fibSeq3s infinite recursion at the nth item? That is where the weird turn pro: The “seq[n]” is magic. How can you subscript an infinite list? Doesn’t subscripting force evaluation of a lazy calculation? Well, yes, but we can put that off:

fcn fibSeq3(a,b) { // a is int, b is int or Deferred
   return(a,       // do NOT to force evaluation of b!
      
Deferred(fcn(a,b){fibSeq3(b,a+b)}, a,b) )
}

What we have done is remove the explicit recursion from fibSeq2 and instead made it implicit by returning the next item and a “this is how to calculate the (next+1)th item using this very same function”. I guess you could call this unrolling the recursion.
Aside on the clumsy Deferred syntax: If I used Deferred(fibSeq3,b,a+b), the “a+b” would force evaluation (eager evaluation of parameters to Deferred), which is what I DON’T want. So I wrap the call in another function and let it put off the evaluation of b.
As you may have inferred, a Deferred is just a wrapper on a bit of [unexecuted] code and parameters that is run when something references through it. After the first evaluation, the result is cached for subsequent requests.
Note that this has been very carefully coded to return a number and a lazy – we can key off the number and avoid evaluation of the lazy. That is the key to this whole mess.

And that is all there is to it:
    nthFib(Deferred(fibSeq3,1,1), 3) –> 2
    nthFib(Deferred(fibSeq3,1,1), 8) –> 21

To blow away a little bit of the smoke from the mirrors, here are the steps of nthFib(defer{fibSeq3(1,1)}, 3):

nthFib(Deferred,3) == nthFib(fibSeq3(1,1),3)
n > 1 so call nthFib(Defered[1],2) == nthFib(fibSeq3(1,1)[1],2)
Deferred[1] forces evaluation of fibSeq3(1,1) –>
    List(1,Deferred) == List(1,fibSeq3(1,1+1))
Deferred[1] –> Deferred == fibSeq3(1,1+1)
so the calling nthFib(Deferred[1],2) == nthFib(Deferred,2) ==
    nthFib(fibSeq3(1,2), 2)
nthFib(Deferred,2) == nthFib(fibSeq3(1,2) ,2) –> nthFib(fibSeq3(1,2)[1], 1)
fibSeq3(1,2) –> List(1,Deferred) == List(1, fibSeq3(2,1+2))
nthFib(Deferred,1) == nthFib(fibSeq3(2,3),  1)
Since n ==  1, no more recursion:
answer is fibSeq3(2,3)[0] and again, subscripting forces evaluation
fibSeq3((2,3) –> List(2, fibSeq3(3,2+3))
List(2, fibSeq3(3,2+3))[0] –> 2

Posted in Uncategorized | Leave a comment

No reader wait thread safe objects

One of the things that really sucks about concurrent programming is you are always are having to wait on shared data. Have a global table? You have to wait on a lock to write, you have to wait  on a lock to read. And locks are not cheap if multiple cores have to sync their caches (I’m think about spin locks, OS mutexes really suck time wise). Of course the functional people would say “just say no” to writable tables and Erlang would say no to shared data (I think) but I’m not that advanced and I’m programming in C. It sure would be nice to get rid of locks or at least confine them to writers (especially if the table doesn’t change very much).

So, what would such a table look like? Consider
   void *table[N];
My code could request a entry in the table
   n = registerBlob(&blob);
and I could pass n to any thread (perhaps buried in a struct) that wanted to look at a blob. I do this when space is at a premium in a struct and N is small. The best part? There is NO locking, no waits. I think writing will cause cores to resync their caches. Here is how it’s done.

Writer code
CAtomicInt  N;
void       *zklObjectTable[256];
int registerThing(void *blob) {
   int slot = C_ATOMIC_INT_INC(&N); // returns new value
   slot–;
   if (slot > 255)vmThrow(NoVM,E_VM_ERROR,”Too many blobs!”);
   table[slot] = blob;
   return slot;
}

Reader code
blob.name = “foo”;
blob.importantStuffOthersNeedToKnowAbout = malloc(bigNumber);
foo = registerThing(&blob)

How does this work? First of all, it requires the ability to atomically increment an int. If I can do that, I can reserve a slot in the table no matter how many threads are trying to do the same thing. And once I have my slot, I can take my sweet time filling it in before I let others know about it. The drawback is you can’t iterate over the table because you could wind up looking at a slot as it is changing.

Here are relevant macros for GCC and MSVC:
GCC:
   typedef int volatile CAtomicInt;
   #define C_ATOMIC_INT_INC(ai) __sync_add_and_fetch(ai,1)

MSVC:
   typedef long int volatile CAtomicInt;
   #define C_ATOMIC_INT_INC(ai) InterlockedIncrement(ai)

          _________________________________________________________________

How about if I have a linked list? Not quite so sweet as I have to serialize writers but readers are completely lock free. The magic here is the ability to swap pointers atomically. I’ll show prepending to a list because that is how I add to these types of lists.

Writer code
List volatile *header = 0;
SpinLock     lock;
     // Prepend item to the list in a thread safe manner
     // 1. Serialize threads adding items
     // 2. Atomically prepend the new item to the list header
SPIN_LOCK_ACQUIRE(&lock);
   newListItem->next = header;
   C_ATOMIC_PTR_SET(&header,newListItem);   // magic happens here
SPIN_LOCK_RELEASE(&lock);

So, if 100 threads want to add to the list (at the same time), only one at a time can change the list. That lucky thread will atomically add a new item to the list. This vital step removes the need for reads to lock.

Reader code
for (list = header; list; list = list->next) {
   <LOOK at, don’t touch, *list>
}

Notice, no locks or other cruft. And you can iterate over the list anytime you want. You might or might not get the most recently added item(s), a no-care for my code (the thread that wants an item adds it for iteration).

GCC: #define C_ATOMIC_PTR_SET(pDst,ptr) \
              
__sync_lock_test_and_set(pDst,ptr)
MSVC:#define C_ATOMIC_PTR_SET(pDst,ptr) \
               InterlockedExchangePointer(pDst,ptr)

Posted in Uncategorized | Leave a comment

Trees

treeI was thinking about trees and tree traversal, mostly because I had to write a breadth first traversal in C to look at an inheritance hierarchy (C3 superclass linearization (aka MRO) is way too big a hammer for what I was trying to do). That lead to thinking about trees in general and is it realistic to write a general tree class? Not sure but for an example, I took the tree at right. It has six nodes, three of which are leafs, three nodes hold strings (“Root”, “A”, “B”), two are numbers (pi, 3) and one is another tree. In this case it is a B tree but, for this example, there is no limit on the number of child nodes or what each node contains. I’m going to focus on what traversals look like.
Here is a code snippet:

class Tree {
   fcn init(payload=”root”)
      { var [const] root = Node(payload); }

   class Node {
      var nodes,payload;
      fcn init(payload = Void){
         self.payload = payload;
         nodes = List();
      }
      fcn addLeaf(node) { nodes.append(node); node }
   }
   fcn addNode(node,payload)
      { node.addLeaf(
Node(payload)) }
   }
}

A Tree contains one root Node and a Node contains a payload and other nodes. To build the tree in the picture:

tree := Tree();    // := creates temp var
A := tree.addNode(tree.root,”A”);
B := tree.addNode(tree.root,”B”);
tree.addNode(A,(1.0).pi);
tree.addNode(A,3);
tree.addNode(B,Tree(“other”));

OK, now the fun part; traversal. A depth first traversal is easy: visit node, recurse on the nodes.

fcn _depthFirst(tree){
   fcn(nodes,depth){
      foreach node in (nodes){
              // next processes noode
         println(“Node: “,node.payload);  // <—-
         self.fcn(node.nodes,depth+1);
      }
   }(List(tree.root),0);
}

Here, I use a nested function to make it really easy to package the recursive traversal. But there is a problem; this isn’t general at all. When I put on my “end user hat”, I don’t want to know anything about the tree guts, I just want to

foreach payload,depth in (tree.depthFirstWalker())
  
{ println(” “*depth*2,payload); }
root
  A
    3.14159
    3
  B
    Class(Tree)

I create the as yet undefined depth first iterator, which returns the payload and tree depth for each node and then I print two spaces for each level of the tree and the payload. OK, that was easy but how to implement? Generators to the rescue! Or continuations, fibers, etc. The first thing to do is to change the line that needs to be changed to
                  vm.yield(node.payload,depth);
This will bail out of the traversal but do a state save so the traversal can be resumed. Then, to build the generator:
                  fcn depthFirstWalker { Utils.Generator(_depthFirst,self); }
And we’re done. Unless we would also like to do a breadth first traversal. Which isn’t quite so easy and concise because we have to build the next row of the tree so we can traverse it. But the end user doesn’t need to see that:

foreach payload,depth in (tree.breadthFirstWalker())
   { println(” “*depth*2,payload); }
root
  A
  B
    3.14159
    3
    Class(Tree)

And there you have a nice simple, easy to use and general tree class. Here is all the code:

class Tree {
   fcn init(payload=”root”)
      { var [const] root = Node(payload); }

   class Node {
      var nodes,payload;
      fcn init(payload = Void){
         self.payload = payload;
         nodes = List();
      }
      fcn addLeaf(node) { nodes.append(node); node }
   }
   fcn addNode(node,payload)
     
{ node.addLeaf(Node(payload)) }
   fcn _depthFirst(tree){
      fcn(nodes,depth){
         foreach node in (nodes) {
            vm.yield(node.payload,depth);
            self.fcn(node.nodes,depth+1);
         }
      }(List(tree.root),0);
   }
   fcn depthFirstWalker { Utils.Generator(_depthFirst,self); }

   fcn _breadthFirst(tree){
      reg depth = 0, nextRow = List(), nodes = List(tree.root);
      while(nodes){
         foreach node in (nodes){
            vm.yield(node.payload,depth);
            nextRow.extend(node.nodes);
         }
         nodes = nextRow; nextRow = List();
         depth += 1;
      }
   }
   fcn breadthFirstWalker
     
{ Utils.Generator(_breadthFirst,self); }
}

// —————- An Example ——————
tree := Tree();    // := creates temp var
A := tree.addNode(tree.root,”A”);
B := tree.addNode(tree.root,”B”);
tree.addNode(A,(1.0).pi);
tree.addNode(A,3);
tree.addNode(B,Tree(“other”));

foreach payload,depth in (tree.depthFirstWalker())
   { println(” “*depth*2,payload); }

println(“\n—————-“);
foreach payload,depth in (tree.breadthFirstWalker())
   { println(” “*depth*2,payload); }

—————————End of source code
C:\ZKL>zkl Tmp\tree.zkl
root
  A
    3.14159
    3
  B
    Class(Tree)

—————-
root
  A
  B
    3.14159
    3
    Class(Tree)

Posted in Uncategorized | Leave a comment