Sunday, May 29, 2011

Fully Typed LISP

I have been thinking about this concept for at least couple years now, pretty much every since I stumbled upon LISP in the first place.

Let's first outlay a couple assumptions I have built up over the years.

Compression potential is essentially abstraction potential, or the level of abstraction you can comfortably maintain in a programming language. Assembly has no compression potential aside from functional decomposition while a programming language with a programmably compiler has infinite compression potential. Lisp and Factor, (moreso factor than Lisp for whatever reason) have almost infinite compression potential. C++, using templates and X-macros, has actually decent compression potential. Functional languages, with closures, discriminated unions, and pattern matching has pretty good compression potential for some problems. For others, simple imperative languages clearly dominate.

Optimization potential is the potential for the exact same piece of code to run faster over time due to advances in compiler (or runtime) design. Lisp again had seemingly infinite optimization potential but we later figured out that much of this potential was impossible to get to (no compiler is sufficiently smart) although you will notice that SBCL is often 3rd in the language shootout game meaning if you use it in certain ways, it can be quite fast. A completely abstract programming language like a true functional language such as Haskell or perhaps prolog should also have very high optimization potential because you are programming to a completely abstract environment.

Correctness potential is the potential for the compiler to check that the program is correct. This requires something of a type system and all compilers, dynamic or otherwise have it. Again fully typed functional languages such as Haskell definitely have this exploited while in C++ the exploitation is up to the discretion of the programmer and in LISP you have to hope that your tests cover all of your runtime situations because in dynamically typed languages the compiler isn't helping you much.

One key difference between the type of problems caught by a static type system and the type of problems caught by some sort of runtime testing, be it unit tests, integration tests, system tests, black box, white box, or however you want to categorize the tests. A type system guarantees that certain types of problems won't occur. But it doesn't guarantee that the program will ever produce the right answer for any input, or that it will run to completion (in most causes, I know Haskell catches some of these). Thus you always need unit tests, but unit tests can't get complete coverage of an even moderately complex system.

In a highly complex system, such as a physics engine, unit tests help but can't hope to cover even a small percentage of the range of test cases. Especially when the right answer is highly dependent upon human perception of what looks correct or some other fuzzy definitely. So a type system is absolutely necessary.

So taking all of this into account, I want to outlay the vision and some syntax for a new language, a sort of fully typed lisp-syntax language.

Lets start with the preprocessing stage. At this stage we can't have any types, any sort of compiler extension is going to be roughly of the type of text->text. Traditional LISP macros work in this range. The output of the preprocessor goes to the typing stage where the static typing system starts interpreting the data and providing feedback.


So, in this stage, anything that looks even mildly like lisp gets passed through. We have at the moment only one mode of interaction with the compiler, def-parse-fn.

The compiler language at this point is traditional, untyped list with a macro syntax of clojure. Unlike traditional lisps lists evaluate to themselves unless their first member is something declared through def-parse-fn. Inside parse-fns, we have the normal family of escape characters of a lisp, i.e. backtick, tilde-evalutation, etc.

Now we get to another stage, where the type system can start to function. To do this we must, unfortunately, define more syntax.

(defvar <vartype>varname (init-body)(attribute-list))

vartype may be inferred from the initialization body. But it can also be specified formally to ensure the initialization body matches what one expects.

(defn <rettype>fnname (<type1>arg1 <type2>arg2)
(body-list) (function-attribute-list))

Like vartype, rettype can be inferred from the function body but it can also be specified formally. Also both types have matching dec{var,fn} members for simply declaring these objects to the type system but and expecting them to be found elsewhere.

I can't see how clearly this shows through in the blog but there can't get a space between these items, and types are always inclosed in <type-list>. This makes it a bit more familiar to people used to c++, java, or C#. It also allows them to pass through the first stage unchanged and with no chance of interpretation.

So, lets talk about defining a structure, or a named data layout possibly associated with some functions.

(deftype typename
(struct
(field name (no-arg-initializer body) (field-attribute-list))
(field name (mutable))
(constructor (arg1 arg2 )
(init-field fieldname init-list))
(destructor)
(property (get-fn) (set-fn)))

I could go on with member functions, but you get the idea. In this case, the typename names a defined layout, in memory, along with functions for creating them and destroying them. Fields can specify default initialization values for their members and such, but structs should also be able to specify that they a much dumber, meaning they can be treated like PODS (plain-old-datatype members)

Given a type, we would like to associate functions with the type and with an instance of the type (static vs normal functions).

(assoc-type-fn typename fnname) function is associated with the type.
(assoc-fn typename fnname) function is associated with an instance. One of the arguments to the function must be of the instance type and this argument will be expected to be elided when calling the function through the instance.

These functions may be used outside of the type definition to associate functions with the type. Whether the association is done inside or outside of the type, if you are calling a function via an association it looks like this:

(type-name fnname arg1 arg2)
(instance-var fnname arg1 arg2)

Breaking the normal lisp tradition allows intellisense to work normally. So in essence, instance are functions, where the first argument is the function name telling you which function to call. Ditto for types.

An important point is that getting/setting a field is no different syntactically than getting/setting a property.

(instance-var set prop-or-field-name value)
(instance-var get prop-or-field-name)

So I give a natural migration path that does require a recompilation but doesn't require much else.

At this point we have at least two other interesting meta programming possibilities. The first would be def-fn-macro.


(def-fn-macro macro-name( arg1 arg2 ) (body))

Note that these

Another macro type would be a type macro, where a the compiler expects a new type defined, def-type-macro.

The final statement of a type macro *must* or return a new type. This type may have functions associated with it, even static functions but it must be a type.

Now we turn our attention to the type system.

The type system for a language like C++, removing auto-casting, is very simple. It simply asks if the types match for assignment and function calls.

The type system for Haskell is a theorem proving machine.

I would like to start with the C++ type system, and carefully and transparently work toward the Haskell type system in a way that doesn't interfere with existing code. I would also like the type system to be compile-time extensible such that you can have a type system macro but I don't have a clear idea what that would look like yet.

One difference I have noticed between what I want to do and C++ is that in C++ changing a type, which is a compile time object, requires a runtime function call (which of course may be elided or inlined, but whatever).

Let's say we have a function that only works on integers within a certain range. I would like to declare that function with those parameters declared in the type of the function call.

But I don't want changing an arbitrary integer into the correctly typed integer (or back) to mean any real copy operations, even with the naive compiler. The type is something that, while it may require a runtime check, doesn't fundamentally change the underlying data in the system. So it shouldn't require an data copy or transfer operations. The compiler should ensure this runtime check is done before the function call. You can do this type of programming in C++ but it is tedious and perhaps that is the best you can do.

But one large difference between the way C++ is used and Haskell is that in Haskell people really do try to use the type system to ensure program correctness. This type of programming needs to be encouraged.

Also, Eiffel's contracts need to be encouraged but I believe the way to do this is to enable them to be done in the type system. Then let the compiler work out exactly when and how to check the pre-condition and post-conditions and such. Code that doesn't maintain the types correctly doesn't check the pre and post conditions.

Now, moving quickly, the runtime. Unfortunately, the standard lisp runtime is tough to do if you are type-checking and compiling things although haskell seems to have done something like it.

The LISP runtime, however, allows you to replace an existing function *in a running program*. This is actually quite key when debugging certain types of problems, and can make almost all debugging much faster. It also eliminates the possibility of inlining and a whole host of optimizations so it needs to be used sparsely; or at the very least the programming needs to be able to enable/disable it for a set of functions.

In addition there is the technical problem of calling a specific function from a generic data system (the repl is of course, generic). This means that every function that needs to be called generically at least needs to have a generic thunk wrapped around it that takes the arguments from an appropriately sized pointer array and writes the result back to an appropriately sized buffer.

Anyway, I am attempting to write this typed list language. Note that I didn't say anything about a garbage collection system. I think that while those do make programming easier, they are a large, unpredictable solution to a set of smaller problems. Regions, memory pools, and stack based construction/destruction all contribute to lessen the need for any sort of garbage collection.

Ditto for multithreading. I think clojure's current facilities for multithreading are exactly the correct ones (especially its stm implementation) but I also think some of the latest advances in the C# compiler, basically enabling continuations, are also very important.

Finally I am not talking about locally defined closures yet. The three main things I think make programming much more concise aside from a programmable compiler are closures, discriminated unions and more generally pattern matching. Out of those, only closures so far need extensive compiler support and I will get to that support when the time comes.

The other feature of C++ that absolutely must make it is being able to attach a set of statements that have to be done at any exit of an enclosed block of code. This is a generalization of using objects on the stack whose destructors do cleanup, thus ensuring that code gets run no matter how the function is existed. The generalization would look like:

(on-exit (stment-list)) and the compiler would ensure that the various details around this are done.

Exceptions, at least c++ style exceptions aren't mentioned because they are non-trivial to implement and are actually still controversial. I do feel that their use in certain contexts simplifies the code but I also know from experience that you can live happily without them and their overuse makes the code much less predictable. LLVM provides pretty good support for them, however, and i do intend at some point in the future to support them.

I am working on an initial implementation of the preprocessor, it has taken me a month alone to go through the various details to solidify my thoughts on the language compiler this far.

The overall vision is that a programmable compiler will beat a sophisticated one both in the short and long runs, at least in terms of programmer productivity. Furthermore, if we carefully break down the most useful features of most languages we can add them slowly and carefully when necessary but they don't need to be boiled into the base compiler. Extending the compiler allows us to add most features in a more modular and optional way.

Can this be faster than C/C++? I don't know, gcc has 30 years of optimizations behind it.
Can I write safer code in this than in C/C++? I don't know, that depends on how sophisticated we want to make the type system.

I am betting than a simpler low level system with good programming and modification possibilities beats more complex and developed systems. Over time I know that this means that community agreement and understanding of shared abstractions is much more important than with a language with a fixed, standardized definition. A good digression is why this doesn't work with c++. Why does everyone roll their own vector class?

This does give a good platform for experimentation for higher (and lower) level language features.

Sunday, February 14, 2010

I am a dancer

This is extremely difficult for me to say.

Something has changed in the last year or year and one half and I am really no longer a computer scientist. I am quite good at it and will always work and study hard where it is concerned. But it isn't my passion any longer.

I code when I am at work. I don't code on the weekends and I don't code at night when I am done at the office.

Every evening I am either dancing or stretching and watching videos of dancing.

I do ballet. Twice a week. I bounced around the room attempting tour jetes today and one of my knees is sore to prove it.

I just spent an hour watching anaheim ballet videos.

Anyway, I code for a living. But I am a dancer.

Chris

Sunday, December 13, 2009

Lisp C preprocessor

In my daily job I program in C++. It has been that way for 10 years and it looks like it will be that way for 10 more years.

I understand c++ quite well but there are aspects of the language I find quite tedious. Actually, look, I don't really care to justify my design decisions. I am going to attempt to write a lisp-like language that generates C.

Hopefully I can make it enough like C that the parenthesis-to-bracket compilation is cheap and fast. I would also like to take some stabs at what I believe are C's largest problems but I need help because this is really tough and I want to incorporate more ideas from other people.

So, to cut to the chase:

The point first and foremost is to allow writing C code at a higher abstraction level. I want to write:

(let [sum (reduce + 0 (int-array 1 2 3 4))]
(cout sum))


I want this to compile down to:


{
int __temp[] = { 1 2 3 4 };
int sum = 0;
for ( int __idx = 0; __idx < 4; ++__idx ) sum += temp[__idx];
cout << sum;
}


and given:


(let [sum (reduce + 0 (std-vector int 1 2 3 4))]
(cout sum))


I want to see:


{
std::vector<int> vec( 4 );
vec.push_back( 1 );
vec.push_back( 2 );
vec.push_back( 3 );
vec.push_back( 4 );
int sum = 0;
for( std::vector<int>::const_iterator __temp1 = vec.begin(),
std::vector<int>::const_iterator __temp2 = vec.end();
temp1 != temp2;
++temp1 ) sum += *__temp1;
cout << sum;
}


Furthermore I want to be able to define a reduce like function. Or a map function (with the result provided as an argument; !no malloc assumptions!). You can't do this in either c++ or in c for several reasons. The first and foremost is that in some senses you can but defining closures is incredibly painful and tedious. Or you can use boost::bind and watch your compile times go through the roof (why would you need to compile + test frequently anyway?).

So, anyway, this is my theory. The language doesn't really matter. Look at what people have done with Factor. Basically, *you* are the compiler with Factor; it is essentially a really sophisticated assembly language. As is Lisp, as is C, as are most languages that don't have true abstract datatypes or some of the other abstractions that functional languages have.

The ease of abstractions and the level of abstractions you can created are what matter.

I need to be able to abstract over types like the templating system of c++ kind of allows you to do. I need to be able to write a compiler-level foreach definition that can be extended for new types quickly. And then reduce simply works using the previous foreach definition.

Imagine c++ template language didn't suck and actually allowed you to do anything with the compiler you wanted to. C++ programs would become a *lot* more terse and probably much more correct. Its like being able to define you own dsl *very* quickly and efficiently and have it output to c++ code.

Then imagine a system that was designed from the ground up to make creating shared libs on any platform and using them from other languages really really easy.

Now imagine some CUDA and OpenCL bindings so you can do hardcore parallel computing very efficiently. Mated with a language like clojure you would be able to take advantage of several different levels of parallelism (machine,CPU,GPU) all very efficiently.

Sunday, June 28, 2009

Success Intelligence

I am reading a new book and it asks some interesting questions. Here is one interesting example (I decided to do this while meditating):

Ask yourself "What do I want?"

Think about it for ten minutes.

Ask yourself "What do I really want?"

Think about it for ten minutes.

"What do I really really want?"

Think about it for ten minutes.

There! You have had your mediation time, a little bit of self exploration, and some life advice all in the same 30 minutes!

Chris

Monday, May 4, 2009

Multithreaded UI

Lets start this discussion with this assumptions:

It is desirable to have your user interface as threaded as possible.

Thus we guarantee less about the latency of the updates and more about the potential to display extremely intricate user interfaces on low-parallel machines.

First off, lets talk about levels of parallelism. First off, we are counting threads of execution (TOE), not cores:
low : On the order of 10's of cores (1 - 99 TOE)
mid : On the order of 1000s of cores (100 - 9999 TOE)
high : 10,000 TOE and up.

Currently, our machines exhibit low levels of hardware parallelism. Some big servers exhibit mid levels of hardware parallelism, and graphics cards as well as supercomputers exhibit high levels of hardware parallelism.

I want to think about extending UI implementations to low levels of parallelism from a single, or largely single execution model present today.

As a simple model, lets take a 3d view with an object along with a palette where the object's position is displayed. We have two different types of updates, where the user drags the mouse and when the user enters a discrete value.

Generically, we have some data model. We have two different views of the data model with most likely extremely different datastructures used to display the data.

We furthermore assume that rendering is only safe from one thread. This thread renders both views according to some schedule, either on dirty or as needed due to a framerate requirement. Breaking this assumption usually has extreme performance implications *or* it is not allowed (i.e. hard crash), at least with current rendering API's (opengl, .net, etc).

Starting from concrete and moving to generic, here is a design that has been slowly manifesting itself in my mind.

Per view, there is some translation from the model to some very view specific datastructures where the view's particular renderer iterates over them and takes over.

I propose that there is a thread queue where each view places its individual controller. So the entry points to these translation steps can all be started from parallel. The product of these translation steps is thrown into a threaded queue where the render thread takes over and iterates over them telling each view to update on its own accord.

In any case, something changes the base model. Now all the controllers need to have a chance to look at the model and decide if they need to produce a new render datastructure. This translation step can be done in parallel for each view. Furthermore, this translation step should translate the model quite far into view specific structures so that the render thread finishes rendering as fast as possible.

So, functionally, we have:

mutator->model, model->view structure(s), view structure(s)->renderer.

Ideally this design would allow for views that have simple translations to finish and render quicker. Thus a view with a particularly involved translation step to render wouldn't have the ability to throttle the application.

So, we can at least speed up rendering of complex applications with multiple views in a very basic sense without allowing an involved view to slow down the application. Each translation step should be further threaded if it makes sense for the amount of data and the problem domain but we haven't said anything about that yet so we can't assume anything about the specific translations.

UI's are not written with this design but if you want truly sophisticated graphics effects it will require allowing the machine more room to process complex views and components. Thus rendering the entire application cannot be held up by the rendering of a complex view and user interaction should not be noticeably slowed down by the rendering of a complex view.

Ideally you can also design a system where the more expensive views receive proportionally more compute time in a multicore system and this type of application design requires thinking very hard about your translation pipeline from model to view.

Chris

Saturday, April 4, 2009

App design (internal, read if you want a headache)

I am building an application to create beautiful interactive graphics. This isn't a game engine; it will be used to create really advanced applications.

In any case, I need to write out some of my design ideas because they are kind of stuck in my head.

Application programming with functional datastructures is different than with imperative or mutable datastructures.

This is because with mutable datastructures, you need to take into account the changes to the basic model such that you can inform the rest of the system about what is going on.

Since functional datastructures are immutable, the rest of the system can use a simple identical? comparison test. If not identical to what it was last time (which is a pointer comparison) then something changed.

Now you can segregate your data into groups, link the disparate pieces through ids and you should end up with a system that has amazing undo capabilities *and* doesn't take a lot of though to program.

Undo is simple and efficient. Just save the old datastructure. Since you know that the datastructures will have large portions of structural sharing, you can bet that your memory usage will grow slowly and efficiently, perhaps more so since with imperative datastructures you have to remember what changed and save this piece of information separately. Redo is similarly easy.

So, basically I intend to have each view simply check the datastructures it cares about and update its internal pieces *if* they are different from what it expects. No events (other than render or update events), no complex transactional-based undo/redo system. Just save the datastructures to named variables and go with that.

It *really* changes the MVC pattern. The controller doesn't need to do nearly as much; it is really much simpler. The views can control how much caching they need by how many references they keep into the main application datastructures. The model can be manipulated very simply and you can still guarantee a lot of details.

Also, using the basic clojure datastructures means I get serialization for free. They serialize pretty naturally.

So enough about that, I need to figure out how I am communicate things to each system.

Looking at the renderer, it simply needs a set of render commands. These commands a very basic, like render object with these shader parameters to this FBO.

Mouse picking is easy, just run the appropriate render commands in selection mode.

The render commands do not need to be generated in the rendering thread; a lot of pre-processing can go on in other threads to generate very specific and fast render commands.

There needs to be some sort of central state that gets updated and somewhat synchronously processed. So a mouse click should generate a request for a hit test. Next time something renders, that request should be processed and the results, should there be any, sent back to the system.

That hit request should translate into some state changes, like an object being outlined or something. These changes can be done in another thread and a new render-command list be generated that does something else.

So, coming up with very specific design guidelines, I guess this is it.

You are going to have a set of N views. When a piece of the model changes, something should analyze this information and update <= N view specific datastructures. These updates can be done in parallel, so there is some parallelization ability there.

These shouldn't happen on the view or render thread, you should do a bunch of pre-processing on other thread such that you build very view specific datastructures that efficiently translate into pixels on the screen. Each view may be able to further break down its updating system but I doubt they could really do this efficiently. It would probably be better to create sub-view-sections and process them independent of each other. Also, they shouldn't update anything in the event that nothing changed that they care about.

The renderer's view specific datastructure is the list of render commands it is rendering.

So lets walk through an entire event sequence.

You have an object on the screen.

mouse down is sent up to the system from the glview.
a hit test is scheduled for the next time something renders and added to the gl todo list.
lets say you hit the object. This is sent back to the system where processing takes place that changes an application datastructure such that an object is selected now that was not.
This selection event is processed and all the views-controllers are told a new model is available. These controllers, running in the systems' CPU-bound thread, do however much pre-processing they can and change some view datastructures. After the changes are done, each control is told to update.

Now you start dragging. This means that the object should move under the mouse (scale,translate,or rotate depending). The view's drag handler should send the drag event to the system where it will do the math required to move the object around. This thread switch may introduce too much latency but we will see. Next the system will update the scene, views that care should update their internal representations (like the render commands). And repeat.

The question is is there too much latency by jumping from the view thread to updating the application's datastructures? Should that happen synchronously while the view updates happen on the separate threads?

Anyway, needed to get some thoughts out of my head.

real hard fbo issue

First off, lets state some the relevant keywords such that a google search has a prayer of finding this...

GL_FRAMEBUFFER_UNSUPPORTED_EXT
glFramebufferTexture2DEXT

OK, so here is the lowdown. It is critically important for most of the really interesting things I want to do to be able to allocate framebuffer objects that render their information to textures.

On the mac, I would create a texture and set it on the framebuffer object and it would work.
On windows, I would always get the above error. Now, pray tell, how would it be possible that textured framebuffers are supported on the mac but not on windows *on the same machine*?

Well, I checked a lot of things. I thought that perhaps this was because one was using pbuffers and the other was using swing's fbo pipleline. I changed the window to be a awt canvas (thus rendering to the native window surface) instead of a swing GLJPanel.

I set up a simple test case using an example from the web (that failed, btw.)

I downloaded lwjgl, setup a test case, and ran an example (that also failed, btw).

I spent hours searching the internet. This is all in my spare time after work, so I get at most 2 hours a day to work on this stuff and that is only a couple times/week. One serious gl problem can set me back several weeks.

I also printed out every single variable I could think of that might affect this operation (pixel store state, pixel transfer state, read/draw buffer status, etc.)

Anyway, to stop holding people in suspense, what was causing the allocation to fail?

It was because I wasn't setting the min and mag filters on the texture before I allocated the FBO.

Min and mag filter specify what the texture lookups should be doing when there are more textels than pixels (minication) and when there are more pixels that textels (magnifying the texture). They are attributes that affect how the texture is *read* from memory, not how it is written.

In any case, in the document for FBO's it explicitly states that mipmapped images are not supported even thought the fbo texture renderbuffer call takes a level.

If you look at the defaults for minfilter and magfilter, they are GL_NEAREST_MIPMAP_LINEAR.

Well, NVIDIA cards are picky about this. Apparently, just setting the min and mag filters makes the texture object a mipmap texture object and the fbo allocate will fail with unsupported (which is true in a pedantic, brittle, poorly thought out sense I guess).

So anyway, a couple evenings of debugging and trying out anything I could think of and the answer is simple. When you want to use a texture on an FBO, set its min and mag filters to either LINEAR, CLAMP, or CLAMP_TO_EDGE. Or probably any constant that doesn't explicitly say mipmap in it...

Jeez what a PITA.