Sunday, December 13, 2009

Lisp C preprocessor

In my daily job I program in C++. It has been that way for 10 years and it looks like it will be that way for 10 more years.

I understand c++ quite well but there are aspects of the language I find quite tedious. Actually, look, I don't really care to justify my design decisions. I am going to attempt to write a lisp-like language that generates C.

Hopefully I can make it enough like C that the parenthesis-to-bracket compilation is cheap and fast. I would also like to take some stabs at what I believe are C's largest problems but I need help because this is really tough and I want to incorporate more ideas from other people.

So, to cut to the chase:

The point first and foremost is to allow writing C code at a higher abstraction level. I want to write:

(let [sum (reduce + 0 (int-array 1 2 3 4))]
(cout sum))


I want this to compile down to:


{
int __temp[] = { 1 2 3 4 };
int sum = 0;
for ( int __idx = 0; __idx < 4; ++__idx ) sum += temp[__idx];
cout << sum;
}


and given:


(let [sum (reduce + 0 (std-vector int 1 2 3 4))]
(cout sum))


I want to see:


{
std::vector<int> vec( 4 );
vec.push_back( 1 );
vec.push_back( 2 );
vec.push_back( 3 );
vec.push_back( 4 );
int sum = 0;
for( std::vector<int>::const_iterator __temp1 = vec.begin(),
std::vector<int>::const_iterator __temp2 = vec.end();
temp1 != temp2;
++temp1 ) sum += *__temp1;
cout << sum;
}


Furthermore I want to be able to define a reduce like function. Or a map function (with the result provided as an argument; !no malloc assumptions!). You can't do this in either c++ or in c for several reasons. The first and foremost is that in some senses you can but defining closures is incredibly painful and tedious. Or you can use boost::bind and watch your compile times go through the roof (why would you need to compile + test frequently anyway?).

So, anyway, this is my theory. The language doesn't really matter. Look at what people have done with Factor. Basically, *you* are the compiler with Factor; it is essentially a really sophisticated assembly language. As is Lisp, as is C, as are most languages that don't have true abstract datatypes or some of the other abstractions that functional languages have.

The ease of abstractions and the level of abstractions you can created are what matter.

I need to be able to abstract over types like the templating system of c++ kind of allows you to do. I need to be able to write a compiler-level foreach definition that can be extended for new types quickly. And then reduce simply works using the previous foreach definition.

Imagine c++ template language didn't suck and actually allowed you to do anything with the compiler you wanted to. C++ programs would become a *lot* more terse and probably much more correct. Its like being able to define you own dsl *very* quickly and efficiently and have it output to c++ code.

Then imagine a system that was designed from the ground up to make creating shared libs on any platform and using them from other languages really really easy.

Now imagine some CUDA and OpenCL bindings so you can do hardcore parallel computing very efficiently. Mated with a language like clojure you would be able to take advantage of several different levels of parallelism (machine,CPU,GPU) all very efficiently.

Sunday, June 28, 2009

Success Intelligence

I am reading a new book and it asks some interesting questions. Here is one interesting example (I decided to do this while meditating):

Ask yourself "What do I want?"

Think about it for ten minutes.

Ask yourself "What do I really want?"

Think about it for ten minutes.

"What do I really really want?"

Think about it for ten minutes.

There! You have had your mediation time, a little bit of self exploration, and some life advice all in the same 30 minutes!

Chris

Monday, May 4, 2009

Multithreaded UI

Lets start this discussion with this assumptions:

It is desirable to have your user interface as threaded as possible.

Thus we guarantee less about the latency of the updates and more about the potential to display extremely intricate user interfaces on low-parallel machines.

First off, lets talk about levels of parallelism. First off, we are counting threads of execution (TOE), not cores:
low : On the order of 10's of cores (1 - 99 TOE)
mid : On the order of 1000s of cores (100 - 9999 TOE)
high : 10,000 TOE and up.

Currently, our machines exhibit low levels of hardware parallelism. Some big servers exhibit mid levels of hardware parallelism, and graphics cards as well as supercomputers exhibit high levels of hardware parallelism.

I want to think about extending UI implementations to low levels of parallelism from a single, or largely single execution model present today.

As a simple model, lets take a 3d view with an object along with a palette where the object's position is displayed. We have two different types of updates, where the user drags the mouse and when the user enters a discrete value.

Generically, we have some data model. We have two different views of the data model with most likely extremely different datastructures used to display the data.

We furthermore assume that rendering is only safe from one thread. This thread renders both views according to some schedule, either on dirty or as needed due to a framerate requirement. Breaking this assumption usually has extreme performance implications *or* it is not allowed (i.e. hard crash), at least with current rendering API's (opengl, .net, etc).

Starting from concrete and moving to generic, here is a design that has been slowly manifesting itself in my mind.

Per view, there is some translation from the model to some very view specific datastructures where the view's particular renderer iterates over them and takes over.

I propose that there is a thread queue where each view places its individual controller. So the entry points to these translation steps can all be started from parallel. The product of these translation steps is thrown into a threaded queue where the render thread takes over and iterates over them telling each view to update on its own accord.

In any case, something changes the base model. Now all the controllers need to have a chance to look at the model and decide if they need to produce a new render datastructure. This translation step can be done in parallel for each view. Furthermore, this translation step should translate the model quite far into view specific structures so that the render thread finishes rendering as fast as possible.

So, functionally, we have:

mutator->model, model->view structure(s), view structure(s)->renderer.

Ideally this design would allow for views that have simple translations to finish and render quicker. Thus a view with a particularly involved translation step to render wouldn't have the ability to throttle the application.

So, we can at least speed up rendering of complex applications with multiple views in a very basic sense without allowing an involved view to slow down the application. Each translation step should be further threaded if it makes sense for the amount of data and the problem domain but we haven't said anything about that yet so we can't assume anything about the specific translations.

UI's are not written with this design but if you want truly sophisticated graphics effects it will require allowing the machine more room to process complex views and components. Thus rendering the entire application cannot be held up by the rendering of a complex view and user interaction should not be noticeably slowed down by the rendering of a complex view.

Ideally you can also design a system where the more expensive views receive proportionally more compute time in a multicore system and this type of application design requires thinking very hard about your translation pipeline from model to view.

Chris

Saturday, April 4, 2009

App design (internal, read if you want a headache)

I am building an application to create beautiful interactive graphics. This isn't a game engine; it will be used to create really advanced applications.

In any case, I need to write out some of my design ideas because they are kind of stuck in my head.

Application programming with functional datastructures is different than with imperative or mutable datastructures.

This is because with mutable datastructures, you need to take into account the changes to the basic model such that you can inform the rest of the system about what is going on.

Since functional datastructures are immutable, the rest of the system can use a simple identical? comparison test. If not identical to what it was last time (which is a pointer comparison) then something changed.

Now you can segregate your data into groups, link the disparate pieces through ids and you should end up with a system that has amazing undo capabilities *and* doesn't take a lot of though to program.

Undo is simple and efficient. Just save the old datastructure. Since you know that the datastructures will have large portions of structural sharing, you can bet that your memory usage will grow slowly and efficiently, perhaps more so since with imperative datastructures you have to remember what changed and save this piece of information separately. Redo is similarly easy.

So, basically I intend to have each view simply check the datastructures it cares about and update its internal pieces *if* they are different from what it expects. No events (other than render or update events), no complex transactional-based undo/redo system. Just save the datastructures to named variables and go with that.

It *really* changes the MVC pattern. The controller doesn't need to do nearly as much; it is really much simpler. The views can control how much caching they need by how many references they keep into the main application datastructures. The model can be manipulated very simply and you can still guarantee a lot of details.

Also, using the basic clojure datastructures means I get serialization for free. They serialize pretty naturally.

So enough about that, I need to figure out how I am communicate things to each system.

Looking at the renderer, it simply needs a set of render commands. These commands a very basic, like render object with these shader parameters to this FBO.

Mouse picking is easy, just run the appropriate render commands in selection mode.

The render commands do not need to be generated in the rendering thread; a lot of pre-processing can go on in other threads to generate very specific and fast render commands.

There needs to be some sort of central state that gets updated and somewhat synchronously processed. So a mouse click should generate a request for a hit test. Next time something renders, that request should be processed and the results, should there be any, sent back to the system.

That hit request should translate into some state changes, like an object being outlined or something. These changes can be done in another thread and a new render-command list be generated that does something else.

So, coming up with very specific design guidelines, I guess this is it.

You are going to have a set of N views. When a piece of the model changes, something should analyze this information and update <= N view specific datastructures. These updates can be done in parallel, so there is some parallelization ability there.

These shouldn't happen on the view or render thread, you should do a bunch of pre-processing on other thread such that you build very view specific datastructures that efficiently translate into pixels on the screen. Each view may be able to further break down its updating system but I doubt they could really do this efficiently. It would probably be better to create sub-view-sections and process them independent of each other. Also, they shouldn't update anything in the event that nothing changed that they care about.

The renderer's view specific datastructure is the list of render commands it is rendering.

So lets walk through an entire event sequence.

You have an object on the screen.

mouse down is sent up to the system from the glview.
a hit test is scheduled for the next time something renders and added to the gl todo list.
lets say you hit the object. This is sent back to the system where processing takes place that changes an application datastructure such that an object is selected now that was not.
This selection event is processed and all the views-controllers are told a new model is available. These controllers, running in the systems' CPU-bound thread, do however much pre-processing they can and change some view datastructures. After the changes are done, each control is told to update.

Now you start dragging. This means that the object should move under the mouse (scale,translate,or rotate depending). The view's drag handler should send the drag event to the system where it will do the math required to move the object around. This thread switch may introduce too much latency but we will see. Next the system will update the scene, views that care should update their internal representations (like the render commands). And repeat.

The question is is there too much latency by jumping from the view thread to updating the application's datastructures? Should that happen synchronously while the view updates happen on the separate threads?

Anyway, needed to get some thoughts out of my head.

real hard fbo issue

First off, lets state some the relevant keywords such that a google search has a prayer of finding this...

GL_FRAMEBUFFER_UNSUPPORTED_EXT
glFramebufferTexture2DEXT

OK, so here is the lowdown. It is critically important for most of the really interesting things I want to do to be able to allocate framebuffer objects that render their information to textures.

On the mac, I would create a texture and set it on the framebuffer object and it would work.
On windows, I would always get the above error. Now, pray tell, how would it be possible that textured framebuffers are supported on the mac but not on windows *on the same machine*?

Well, I checked a lot of things. I thought that perhaps this was because one was using pbuffers and the other was using swing's fbo pipleline. I changed the window to be a awt canvas (thus rendering to the native window surface) instead of a swing GLJPanel.

I set up a simple test case using an example from the web (that failed, btw.)

I downloaded lwjgl, setup a test case, and ran an example (that also failed, btw).

I spent hours searching the internet. This is all in my spare time after work, so I get at most 2 hours a day to work on this stuff and that is only a couple times/week. One serious gl problem can set me back several weeks.

I also printed out every single variable I could think of that might affect this operation (pixel store state, pixel transfer state, read/draw buffer status, etc.)

Anyway, to stop holding people in suspense, what was causing the allocation to fail?

It was because I wasn't setting the min and mag filters on the texture before I allocated the FBO.

Min and mag filter specify what the texture lookups should be doing when there are more textels than pixels (minication) and when there are more pixels that textels (magnifying the texture). They are attributes that affect how the texture is *read* from memory, not how it is written.

In any case, in the document for FBO's it explicitly states that mipmapped images are not supported even thought the fbo texture renderbuffer call takes a level.

If you look at the defaults for minfilter and magfilter, they are GL_NEAREST_MIPMAP_LINEAR.

Well, NVIDIA cards are picky about this. Apparently, just setting the min and mag filters makes the texture object a mipmap texture object and the fbo allocate will fail with unsupported (which is true in a pedantic, brittle, poorly thought out sense I guess).

So anyway, a couple evenings of debugging and trying out anything I could think of and the answer is simple. When you want to use a texture on an FBO, set its min and mag filters to either LINEAR, CLAMP, or CLAMP_TO_EDGE. Or probably any constant that doesn't explicitly say mipmap in it...

Jeez what a PITA.

Monday, March 23, 2009

Getting what you deserve

Money has always been relatively easy for me to come across. I think being half mathematician makes it easier for some to obtain and manage money. In any case, ever since I could make it I have never wanted for money. This is a fairly messed up fact in the grand scheme of things.

Lets talk about what people really deserve. I do not think there is such a thing as deserve. There cannot be for if this were so, humans deserve a lot of pain. This seems extreme.

Millions of children die of diseases that even given my quite opulent life by global standards I would never want to deal with. How could they deserve what they got and I deserve what I get? It isn't like I save children from burning orphanages while caring for the sick in my spare time.

Money, of course, isn't everything but health and money together are a lot. I have both and millions (or billions) of people will never have a chance at either.

So, knowing all of this, how can you be OK with money? Why not run around and devote your life to charity (this also doesn't feel right to me)? Then you surely must deserve what you get.

Except when I volunteered a lot I didn't really feel any better about myself. Every once in a while you would connect with someone and then you would feel great for a little bit, but it never lasted. The organizations weren't very professional and I always felt that a bit more could have been done if someone really took getting the job done a little more seriously. Perhaps my standards of what is professional are a little off.

So in any case, it seems impossible to make a logical argument for deserve. A very large part of what happens to someone really is just dice. Luck, destiny, call it what you will shit happens and sometimes you win and sometimes you lose.

Now there is a really high chance that some of the people that I am arrogantly feeling sorry for for not having lots of scrilla really are a lot happier than I am. Being happier than me really isn't that hard. But I don't feel unreasonably unhappy; I feel like I am happy enough to see opportunities when they lie but I have enough reserve that when something takes real discipline I can stick with it.

So is happiness the point of human existence? If deserve doesn't exist, then what does happiness mean? It would have to mean that you are just OK with your circumstances. You don't really owe anyone anything and they don't owe you.

Sunday, January 25, 2009

Its days like these

That fucking rocked. Although it didn't rock at the time, now I feel pretty good.

I decided to get my project working in windows. Since it is a clojure opengl swing program I figured this wouldn't be anything but a quick check and perhaps a couple more function calls.

Thus, instead of quickly starting work I messed around with emacs. My emacs is half-pimped at this moment. I have line numbers, tabs at the top of each frame (in windows and mac), and the theme I like. I have a github project and the same .emacs file runs on both windows and mac.

I also pimped out my clojure setup a bit. I have everything (slime, clojure, clojure-contrib swank-clojure,clojure-mode) all synced up under one dev directory. I can go into each sub directory and sync them, rebuild jars where necessary and just open emacs again and I am running with the latest. No hassle, everything just works (for now).

This all happened yesterday for the most part. I woke up this morning thinking about emacs for some reason and immediately started coding. My project now has built-in slime (remote debugging) support. If you want to connect to the running project, there is a menu item to open up a swank port and start an internal swank server. Now in emacs you can connect to this, start the slime repl, and you are doing pretty good. I don't have to start my program from the repl any more in order to change or examine it.

All this before starting to port my stuff to windows. I did the internal swank server in the mac.

Now, I start running my program under windows and it doesn't work for shit. The first problem that I hadn't anticipated was that for some reason, under windows swing, I am unable to allocate an FBO with a texture. In order to render an anti-aliased scene I have to allocate a multi-sample fbo, render to that, blit to a single sample fbo and then render the result to the main screen. The easiest and by far fastest way to do this is to setup a texture as the downsample-fbo's color render buffer. This works like magic on the mac; no problem.

So, I can't allocate this type of FBO on windows. This sort of fundamental difference on the literally the same exact computer should have fired off a warning in my head. Anyway, I worked around it (using glCopyTexImage2D to copy the read buffer off the downsample fbo. This is much slower but it works) and I saw nothing. I now have two rendering paths, one if you can allocate a texture-fbo, and one for if you can't.

Back to the annoying days of holy shit, what the hell happened?

To make a very long story short, I have all of my java defines enabled for uber fast rendering. Crazy things like:


-Dsun.java2d.opengl.fbobject=true \
-Dsun.java2d.opengl=true\
-Dsun.java2d.translaccel=true\


Note the opengl and fbooject flags. They change, very deeply, the semantics of the system w/r/t opengl and they speed up rendering a *lot*.

So, in no particular order, here are the things I figured out.

Under the mac, setting the fbo to fbo 0 effectively sets you to render to the main screen. This follows as the mac isn't using an intermediate fbo to render to; it is using a pbuffer which is a different entity altogether although they do similar things. On windows, with the fbo option enabled, fbo 0 is *not* what you are rendering to as the main buffer. Thus for the longest time I saw absolutely nothing. White, in fact, which is a little weirder than black or uninitialized memory in my opinion. I have no idea how it occurred to me this was happening but it was just one of many hard problems.

Under the mac, the gl viewport is set to 0, 0, width, height. On windows, it was more like 0, 200, width height because swing was rending to an fbo and my GLJPanel was only rendering to a portion of said FBO. Thus the viewport was setup differently *if* you have sibling swing controls.

For identical reasons, windows swing implementation is using glScissor tests to ensure you *can't* render outside a given area regardless of messing with the viewport. This is a good idea as you can't trust what another client or control will do. I hadn't taken this into account, of course, because the mac doesn't do it. The effect was that when I rendered to the multi sample fbo I only saw a portion of my scene; it took a goddamn long time to figure out whether my full-screen-quad rendering was messed up or to ensure it was fine and start looking for other options. The solution was to disable the scissor test while I was rendering to the temporary fbos and enable after, before I rendered from downsample to the main buffer.

The weirdest bug *by far* was the difference in the glsl interface. The mac gives you every single shader attribute in order, whether or not you can mess with it. So you will see program attributes that look like "gl_Vertex" even though you can't set them yourself (gl takes care of that).

Windows, on the other hand, just showed me the editable attributes. Thus I had been using the attribute index as its gl handle on the mac but on windows this failed in the oddest ways. It took a very long time to start looking around the glsl code and replace the one implicit reference to attribute index with a call to glGetAttribLocation.

Now the program started crashing when I switched from my functional graphics demo to my wave demo. I eventually figured out this was because I wasn't protecting the underlying gl implementation from my changes. Specifically I was using the gl vertex vbo only for my wave program. My multisample system used gl attributes for everything but I like to explore all options so I did it differently in the wave section.

Somehow, leaving the vertex attribute active made the outer program crash hard. Actually, it makes a lot of sense why I guess. They must be using glDrawArrays or DrawElements to render the swing pieces to the fbo. Anyway, a couple glPushAttrib and glPushClientAttrib (along with their popping siblings) solved that problem quickly once the scent had been found.

Now my program works, both demos, on windows and the mac. This probably indicates it work work on linux, as both windows and linux use sun's swing framework. The mac uses apple's swing framework and thus there are going to be hardcore, non-trivial differences between them.

I really didn't want to try to support windows. But unlike linux it is installed on my mac laptop and I can honestly say that my emacs-foo, my swing and opengl foos are a level higher because I took the time to work through the issues. My slime-swank-foo is becoming somewhat formidable.

Chris

Thursday, January 15, 2009

Interesting clojure issue

Worn out from complaining too aggressively

There is a lot I draw from this.

First off, people aren't replying to him on the email list. I find this incredibly lame because I believe, regardless of how he is raising it, that he raises a legitimate point that will cause consternation among people.

The basic thing is that:

user> (= (list 2 3) [2 3])
true
user> (hash (list 2 3))
-1919631535
user> (hash [2 3])
1026
user>

'=' semantics do not match hash semantics exactly. This means that if you intermix lists and vectors as keys in a hash map then you are going to get really odd results:

user> { [2 3] 4 (list 2 3) 5 }
{[2 3] 4, (2 3) 5}
user> (filter (= [2 3] %) (keys *1))
; Evaluation aborted.
user> (filter #(= [2 3] %) (keys *1))
([2 3] (2 3))
user>

You might ask how one would get into this situation, but you will get into it in some subtle ways. For instance you would (map identity [2 3]) and the result would be equal to [2 3] but it wouldn't be the same key in a hash-map.

I personally, now that I know about this, have absolutely no problem working around it. People who are used to c++ have had to work around much, *much* worse (like today I worked around a heap corruption problem that caused a failure in an unrelated piece of code).

People who are used to Java are used to a *completely* normalized environment. This is the ideal that we all strive for; no idiomatic problems that you have to deal with; just pure mathematical consistency. And perhaps a comfortable death of carbon monoxide.

I have never seen anything really cool that is that vanilla. Try using opengl in some advanced case; use c++ for a serious task (and succeed), play with CUDA. Badass tech makes sacrifices and sometimes these sacrifices are exactly in some area that causes serious problems. It is like really good scotch; you have to come to it and just take it for what it is. Judge it because it fails to wipe your ass for you and you miss out on some really fucking awesome engineering.

I think that Rich should probably fix this. But if he doesn't it is a problem that I can trivially work around. I would *much* rather use clojure because I think it is a super fun system to play with and I know that I can work around anything it throws at me, no question. I did it with .NET, I have done it with c++, I can trivially do it with the JVM and clojure.

Abstractions have leaks. And the really good abstractions still have really painful leaks. Learn them, understand them, and move on. Save the judgement for someone who cares.

Chris

Monday, January 12, 2009

Why I love clojure

There is a fundamental fact about programming languages; or rather my interaction with them. I love the newb feeling.

This is a dangerous addiction because it doesn't take long before you never have the newb feeling; at most I have another couple years where I can find languages that fascinate me.

I just thought it was damn fun trying things out in the repl and just finding interesting things.

Somehow, c++ has lost that interest for me and I think I know why.

First off, it is just butt ugly looking. Templates look like shit; although I think they are cool as a compiler-extension mechanism.

Second off, you get problems like this. Today at work, working with a large piece of legacy code that I didn't write, a co-worker and I added something that made the program just crash. During a memory-reclaim operation (embedded game engine; thus it partially does its own memory management).

We went through the obvious possibilities; the most likely being if the object were deleted twice somehow. That wasn't the case, so we wandered around the code. Finally I decided to try changing the order of inheritance for an object with multiple inheritance. That fixed the problem.

Somewhere in the code, there are lines that reinterpret_cast something they should be using static_cast for. The project doesn't have dynamic_cast enabled so that is out of the question. That pisses me off but is only one of about 100 things about it that piss me off.

Anyway, the difference is that during multiple-inheritance, the actual pointer-value will change during an upcast or a downcast. This is because the way c++ objects are stored in memory and the the vagaries of v-table implementations.

This is, coincidentally, why the diamond of death is such a big deal in c++. You end up with two representations of the top of the diamond in memory. Thus:

A
B C
D

A would be in D's memory allocation twice. Thus if you did static_cast<A*>(static_cast<B*>(d_instance)) you would get a different answer than if you did static_cast<A*>(static_cast<C*>(d_instance)). Finally if you want to avoid all of this you can use virtual inheritance in c++. This looks like class B : virtual public A.

Then, however, access to a base-classes' data takes longer because there is an extra pointer in the middle.

All of these details distract you from getting your algorithm perfect or doing a very, very good design and thus I think that most c++ programs are fundamentally worse designed than a lot of programs in other languages.

You can really concentrate on only so much simultaneously. The more you are focusing on the details of an arcane language the less you are focusing on your algorithm and its fail-cases. Or the bigger picture in the sense of where this piece of code fits in the system; or how minimally you can accomplish the task.

The pain of refactoring c++ leads you to do an inordinate amount of up-front design which is always worse unless you are solving the same problem again which you never do.

Finally, TheEyeStrainCausedByLookingAtCamelCaseClassNames, CamelOrJavaCaseVariableNamesAlongWith half_of_the_language_using_underscores_which_no_one_uses_even_though_the_damn_language's_standard_library_is_written_with_it. Mix this with reading about 10 times the characters (literally). And what exactly does this mean?
(this->*item[index])(reinterpret_cast<SomeStupidClassName*>(*data)[4]);

And tell your coding conventions to fuck themselves. I can write better, more understandable code without conventions than you will ever touch with your 10000 lines of coding conventions; most of which have never been proven to improve the readability of sections of code. Every second you spend on coding conventions you would have spent on good or god-forbid great design and frankly, just pure bad design makes things hard to work with. Not code conventions. So unless you have spent the time going through thousands of programs all in different languages all with different coding conventions so you have the breadth and depth of knowledge to make an even semi-informed opinion on what makes code understandable to other people do not write a single line of a coding convention. Because frankly, you don't have any idea what you are talking about.

Man, now that is done with. I doubt I would hate code conventions as much as I do if I haven't worked in a language that requires so much goddamn typing because its base levels of abstraction are just too low to express the vast majority of concepts clearly or precisely.

It was brilliant for java and c# to take the route that they did and take the worst aspects of c++'s type system and continue them. If you want types, use good type inference. If you don't, use a dynamic language. Piss poor required static typing is just a waste of characters. Since every line you write is a liability, that leads code that is overly verbose, hard to refactor, and *requires* sophisticated editors to manipulate in some excuse for efficiency.

I have arrived at the point in my career where I would take slow, interesting, concise code over fast boring as hell and tedious code. I have never had an unsolvable optimization problem; and I have written mmx assembly, x86, used Intel's SSE and a bit of AMD's 3d now vector instruction toolkits, not to mention some insanely fast linear algebra for large datasets using CUDA.

Some things are just tedious and kind of suck. Most things can be done elegantly. Clojure makes doing the tedious no more tedious than it was going to be anyway and doing the elegant simple and interesting.

Chris

On the fact that camel case is stupid.

Capitalization is something I do as little of as possible. I write my name Chris, and I talk about NVIDIA. After that I hate it, its just that i makes you look like a little kid sort of like i like stickers and run on sentences. I like the capitalization of letters at the beginning of a sentence because my eye gets lost without it as I sometimes miss the period. I have tried to not capitalize things altogether and I had to stop.

CamelCaseNamesDriveMeCrazy. asDoJavaCaseNames,Really. dash-case-is-the-easiest-for-me, while_underscore_case_is_a_bit_worse_but_I_can_handle_it.

I-remember-my-grandfather-use-to-write-using-dash-case-when-leaving-notes.

"Anyone-caught-running-water-into-this-basin-will-be-in-trouble. Accidental-or-other-wise!" was something he wrote.

What about syllable-case?

ChrisToPherPeTerNuernBerGer

graphics file formats

It is an odd fact of life for 3d programmers that the file formats *really* suck.

The collada specification *just the spec* is a 3.7M pdf. It doesn't have a lot of pictures.

The collada xml schema is over 400K bytes and clocks in at 11,848 lines.

The FBX toolkit has a c++ library as a file specification. They can't be bothered to publish their file format, even thought it extremely regular and pretty damn well designed by my estimation. You can see it in ascii, but I wonder if the binary version is identical to the ascii or if there is something neat about it.

Then check out lib3ds, try checking out a maya file. Obj is ok if you need something fairly simple.

Supporting any one of these in an application takes a huge amount of work.

Out of all of these I only really know collada. I did a 3ds importer a long time ago. One time I reverse engineered the excel .xls format going from partial documentation and the source code to OpenOffice's source code. Even with all of that, it took a *lot* of effort. Lots of hex dumps. Anyway...

Collada is the best out of all of them in my opinion. I bet I am one of a very very few number of knowledgeable graphics developers who believe that, but it is. They all suck, but at least on has an open development process along with a few large companies pushing it. It isn't a game engine data format; xml is big, verbose, and extremely slow to parse compared to fairly trivial binary file formats. It isn't the best designed piece of software I have ever seen, either. It isn't modular nor entirely consistent (like something of that size could be consistent, but anyway); in some places it is way overly generic and arbitrary (animations. There are at least 5 different ways to store bezier and linear key data. 5. FX composer, 3dmax, maya, and another 3d editor who's name escapes me now because they aren't sold in the US all do things them differently. It doesn't have nearly enough examples of what could be considered the right way to do things; thus everyone has just rolled their own.

But, it is something people can rally behind and we can all use it. I think the way collada does transforms is really really good. They are the source of a lot of problems, but you can do anything with them. Really goddamn good. I spent a huge amount of time getting them to work, but I think they are cool. I also think the way they store geometry data is pretty cool and lends itself perfectly to using opengl efficiently (vbos). I am not a huge fan of its shader sections; the interaction between shaders and scenes (using evaluate_scene tags) kind of annoys me. I just don't think of shaders, be they a special material on an object or a multi-pass effect in the same model and it irks me.

I like collada's extension mechanism but it is clunky. Certain really odd situations happen with different interpretations of the specification.

I wish they had used certain elements of xml schema and don't things a little differently in others. For instance, they don't break up logically different spaces using sub-schemas and xls-include. I don't know if this is for technical reasons, but it makes interpretating the entire specification tough because you *have* to look at a lot of stuff.

Why there isn't collada-geometry that includes collada core, why isn't there collada-animation in a completely different schema that includes common bits? Why doesn't the file start with their extension mechanism and use that to make pieces more modular? Perhaps you can't really have a unified ID-space if you are using different schemas? It may break schema verifiers; which for something like collada are particularly useless as it does lots of outside-schema stuff anyway, and certain large vendors produced files that didn't pass verification as it was.

I believe the FBX file format has superior architecture but it isn't open and it has weird characteristics that really irk me. I wish it had collada's geometry section, and for that record the way collada at least partially allows a good design for shaders and effects blows FBX out of the water. It starts with very generic design (essentially objects with arbitrary properties) and uses convention to and their API to ensure consistency. The details of the data in the format aren't as smart as Collada, however, and that kills it for me.

They both are light years better than 3ds files. I bet maya is pretty good design as that app is really in a league of its own with regard to really amazingly smart design. Also, for the record, Adobe can do some unbelievably cool things with applications. After Effects, even after all of these years is an amazing application. Their plugin API, while tedious and sometimes very poorly thought out, is documented with humor and looking at some of the things they allow is quite enlightening as to the application's internal architecture.

This is perhaps the weirdest thing I have admitted for a long time but file formats are really fascinating to me. API's almost always bore me or piss me off. But for some reason file formats are interesting; especially old ones. The excel file format is really damn cool, and some of the things that the microsoft applications could do was legitimately goddamn tough to do in binary. Things like in-place editing, where the app could run directly from the file without ever loading the entire file into memory at once and could write to the file in the same locations without growing its size (or growing it in a very generic way). Meaning the application could memory-map the file and then instantly use structures with no further loading, theoretically. It could also perform edits to this file while simultaneously growing its memory. This isn't shit tricks done with malloc or new; this is really good, consistent and darn hard engineering challenge that would affect the design of your entire application. On the other hand, the applications take forever to load files now and you have to wonder. Meanwhile KOffice tends to insta-load apps so fast it can be somewhat shocking.

I hate microsoft office; but I believe that in odd corners of it you find some of the most amazing engineering I have ever seen. Excel is an application that has yet to be equalled (or even close) by an opensource alternative. The best open source office suites just don't even compare to Microsoft's tools in terms of quality, features, or unified, consistent design. I guess I just hate office software in general as just the look of it forces me to puke instantly while seeing images of baby Jesus crying.

/ramble

Chris

Wednesday, January 7, 2009

Project update

Finished the first few sections of the refactoring of the project. I believe I will eliminate about 1/4 the code through eliminating dead code and careful refactoring as well as structure the code in a way that doesn't require another dev to know everything all at the same time.

I am giving a demo of its functionality to other NVIDIA people Friday so I can show off some of the really cool features. I noticed that an artist decided to attend the demo, which means I need to think a lot harder about the presentation and the audience. I intend to give another demo to the functional programming languages group later.

I want to present a couple of things. Thinking about it now, I would like to be able to present each feature in a way that everyone at the demonstration, regardless of technical ability, can really understand why it is important.

First off, I want a few of the more impressive features of the app clearly demonstrated. The live shader editing is going to be a hit, as well as perhaps some of the user interface details. This presentation is certainly not about the graphics, however. When this super clear, thorough, and much better than I could have ever written it in my entire life tutorial is working then I will have some eye candy.

I want to talk about the architecture behind the system, especially about the really heavy usage of threads. I guess I need to talk about software transactional memory and exactly why writing really safe multithreading code is much easier with this paradigm. Which means I need to explain functional datastructures, structural sharing, and software transactional memory.

I want to show off the repl a little bit. I have emacs very lightly pimped out and I know how to use the repl, so that should be fun. Having a repl may beat having an entire IDE. This is assuming you design your systems to be run from the repl. Not sure exactly how to prove this, but I just want to make the differences in the workflows very apparent.

So why is the repl so much better?

Having the repl is like having your debugger and your source code editor always running and always checking things out. You can weave in and out of editing and checking results/debugging very fluidly; this makes the distinction between writing and testing or debugging code disappear. Furthermore, being able to write small pieces of things and test them out immediately quickens your adoption of more advanced language features. Finally, you tend to test functions quite thoroughly from the repl. It is much quicker to test them and just look at their output from the repl than it is to write a unit test, run through the failures and debug the unit test.

This had the unfortunate side effect of eliminating the testing codebase I usually write; this I am not quite as pumped about as it means another dev will need to be *much* more careful about what they are doing. Well, win some and lose some.

I wonder how hard it would be to have emacs connect up to another running process? Emacs always starts the process; it can't be all that difficult to have emacs look for a connection at a given port. I wouldn't think it would be too much code to write in your app to have it open up a connection and start whatever clojure-swank starts...

Man, clojure is cool.

Chris

Saturday, January 3, 2009

First Clojure Project

Finished my first clojure project yesterday, at least finished functionally.

I wrote a small program where you have some 3d graphics running and you can change the anti-aliasing settings, the arguments to the glsl shaders, and a few other things.

It main uses clojure, JOGL and swing. It is running doing 4x antialiasing of the screen on my system and using about 35% of one of my two cores. Most of that time is in opengl or swing, I am not sure which. I have figured out that for this program, java 1.6 significantly outperforms java 1.5. This may be due to a much faster swing rendering implementation that is itself rooted in opengl, or the fact the the 64 bit java implementation is faster, or a number of things.

When I have thought of a better name for it I will move it to a new github repository and people can try it out. You would need to install java, JOGL, clojure, and clojure-contrib.

So, here are some of the harder issues encountered doing this:

1. Figuring out which windowing library to use was horrid because I tried QT first. It hung or crashed for days; I tried installing different jvms and all manner of things. Mixing QT with JOGL and clojure really was a waste of time; QT-java, for the mac, just isn't ready for primetime. This was the only real issue that really wasn't fun to solve.

In implementation and debugging (these were fun to figure out).

2. Laziness confused me several times. One time it was classes, I called map and didn't use the return value. Thus nothing happened; I wanted the side effect of writing out to a file. Another time I use a lazy list while doing opengl stuff. I tried to look at this list in the REPL. This caused the entire jvm to crash with a bus error. This is because the repl and opengl run in different threads and calculating the result on demand was causing the list to be evaluated in the repl thread. This was the type of bug that, when you figure it out, you know just marked a significant step in your path to functional zen.

3. Swing layout systems. Every goddamn time I use these I spend forever on at least one stupid problem. GridBagLayout-4-eva. At least with a repl I could literally change the code, hit return and see the results. It really was kind of interesting.

4. REPL madness. I managed to get the program into all sorts of weird states by reloading files in the repl. I produced another bus error, I got things to hang. Lots of stuff. I would sit there an change opengl commands telling all sorts of stuff, hit return, and laugh as it did something weird. I would exit, build the project jar, restart and the program would behave differently because I hadn't loaded a file that I had edited or something along those lines.

5. java.nio.Buffer. There was a bug that took me a long time to solve when I switched to vertex buffer objects. The way you pass these to JOGL is using the newer java.nio.FloatBuffer (or CharBuffer or ShortBuffer...). Anyway, I would hit render and nothing would happen. At first I thought it was perhaps my vertex shader. Then I was thinking it was in the binding the vertex buffer object to this particular vertex shader property. Still nothing. Finally *finally* after working on this for like 2 hours, I looked at the API for java.nio.Buffer. It has a flip function. I deduced out that when you fill a buffer from an array, its position member variable gets set to the end of the buffer. The flip function sets this variable back to 0. This set it up for a read operation somewhere on the JNI side of things.

Super, super brutal mistake caused by a few things. First off, when you call the JOGL call you pass in an explicit byte-count argument. I figured if JOGL had this information, it should be able to take the buffer and just make it work. Second off, I didn't even realize that a java.nio object *had* flip function. Third, I have never used anything with a flip function so the initial read over the API didn't help anything out.

6. glVertexAttribPointer takes *byte* arguments. I was giving it an array of floats, I figured its arguments would be float sized since it knew the buffer that was bound at the time of its invocation and thus knew the datatype in the buffer. After getting extremely bizarre results for another couple hours I figured this out. This was unfortunately when I was hooking up the antialiasing code which also renders to a multi-sample fbo, then downsamples this multi-sample fbo to a single (non antialiased) sample fbo and then you finally render this to the screen. Get it? So there were a lot of links in the chain that could have failed. Oh yeah, and I had a to render to a fullscreen quad for the final step to the screen so I had this other piece of the chain that was failing for a while.

7. The general problems related to running a realtime rendering system. These include: Swing completely destroys the opengl render context every once in a while on resize. I wasn't initially planning to handle this condition because it would traditionally only happen if you ran a screensaver or something like that. This isn't a rip on swing; I understand why they render to pbuffers and thus why they have to reallocate them sometimes. The java debug opengl interface, instead of only throwing an exception once upon error and resetting, throws an error for every single opengl call after the error is made. Thus I would get exception stack frames printed to the repl at 60 frames/second.

The upside of most of the problems of 7 is that I built a much more robust rendering platform much earlier than I was planning. So it automatically reloads files that have been kicked out of the system, rebuilds vertex buffer objects, is generally pretty tough.


OK, so enough about the problems. What did I build? What did I get out of the experience?


I proved clojure's viability, at least at some level, for doing 3d graphics. They are certainly possible. As swing gets faster, they will get more possible. Plus, not all applications need to update at 60 frames a second all the time. Additionally, my usage of clojure is still quite amateur. As I learn more about the language and swing, I may have significant opportunities to make things cooler.

The app is really cool, tight, and small. It has an application log, a 3d view, and an inspector palette where you can check out the properties of things you are looking at. You can click on a shader file and (at least on my mac) it will open the file in an external editor and start listening for changes. Any time the file changes, it attempts to reload the file and shove it back into the program. If it works then you get a new program. If it doesn't it saves the result, deletes intermediates and prints the error log to the application log. Thus you can sit there and tweak shaders till you are dizzy and the app will continue to display reasonable stuff and tell you exactly why the shader didn't load.

You can switch the antialising on or off, which actually changes a bit of the rendering pipeline, and you can choose the amount of aa you like. My laptop, for example, only supports 4x antialiasing. Thus if you select 16x it will try that. Failing 16x oversampling, fall back to 8x and try that. It will continue doing this until it finds antialiasing that is supported. This is supported in the graphics library; you can pass it a sequence of surface specifications and it will try each one until it finds one that is valid.

The docking panel framework I found on the web that is OK. Not great, but not horrible. It wouldn't work for a commercial application but it is like Christmas for a opensource of shareware application. Plus it is LGPL'd. Anyway, you can dock, undock, and do all manner of stuff with the windows.

The application is built on a heavily threaded architecture. OpenGL can only run in one thread, so that is capped. There is a large thread pool for blocking IO and a smaller thread pool (the number of processors) for CPU-bound processing. This is taken care of using clojure's agents which I bastardize to do what I want. I don't use them the way they were intended; I just use their send and send-off commands.

So, for instance, lets say you want to load a file. All files are md5'd upon load so that I can avoid doing anything redundant with large pieces of data. Lets say this file is a vertex shader that is used in one or more programs.

An IO thread loads the file into a byte buffer. It then hands this byte buffer to a CPU thread to do the md5. Finally, the rendering system picks up this new buffer (during the render loop) and tries to make a shader with it. Should it succeed, it finds all programs that included the old shader and attempts to re-link the programs with the new shader and its old counterpart (the shader could have been either vertex or fragment). If it succeeds it replaces the old program with the new one. At each point it logs something.

The log data gets shoved onto the a CPU thread where it is split into lines, has module and message type appended to the front. The application log messages list gets appended to with the new log data.

There is another thread that every so often, perhaps 3 times a second, checks to see if the written log messages are different than the application log messages. If they are it copies the app log messages to the written messages list and fires off a thread that takes the written messages list and builds a large string of them all concatenated together. It then fires off yet another thread using SwingUtilities/invokeLater that does the actual text setting on the log window object.

I didn't want logging to block whatever thread is doing the logging. I didn't want the log window to be appended to very often because this is an extremely expensive operation; especially when the log is long (it is capped at around 1000 lines right now in a clever, lazy fasion using 'take'). I wanted any real work to be done in a CPU thread and not in the logging thread or the UI event thread. Finally you shouldn't access swing components outside of the swing thread if you can help it.

So, the point is that I am using threads like they are growing on trees. Because I have two cores and I know for a fact the second core is usually doing absolutely nothing. This is nothing compared to what will happen over time as i get more and more functionality; I love threads and continuation style threading.

Now, I have all the functionality I want and the application is damn stable and responsive. If it runs, I bet you can't crash it. I can say this because it has very fixed inputs and I have tested every one of them exhaustively, which is feasible in this case.

Lets talk about repl-based development. Lets take my FBO (frame-buffer-object) implementation. Starting from zero, I first write a function that pushes a closure onto a list that gets run on the render thread. It waits for this function to finish (using a CountDownLatch), and then returns the result to the repl *just as if the function was synchronous*. Even though it happened on the render thread.

Next I write a function that creates an FBO. I test this function extensively with the repl right then and there. I pass in nonsense, I passing in large numbers, I find out all about the failure modes of my FBO allocation implementation (of which there are a few). Now I do the same for a simple FBO delete function, chained after an allocate function. Next I test managing maps of named FBO objects so you can create an FBO and refer to it by name to get it later. For the most part, all of this is done without shutting the program down.

In 3d-graphics, this is pretty hard to get right. But designing your code to be run from the repl has the same effect as designing it to be unit tested; it is just a lot tighter and easier to mess with.

Anyway, I used this technique for designing the user interface of my program. Swing layouts really suck to get right. They are time sinks; but a repl and a good testing strategy *really* help things out to speed things up.

OK, so you know what hurt and you know what I thought was cool. Now the next steps...

Here are the famous (and correct) steps to creating good systems:
1. Make it work. Be goddamn sure that you understand the problem.
2. Make it elegant. <== we are here
3. Make it fast. (probably doesn't apply yet).

Now, how do we make it elegant?

1. Remove as much as possible. I call this the code compression stage, although a better way to state it would be the code evaporation stage. This is where you study the language, the code, and really think about your algorithms and the way they are implemented to see if you can think of more concise (but not overly clever) ways of doing things. This is also there I will try to significantly improve the modular decomposition of my code; write better utilities and break code up into generic pieces. This is the best part; where you take a rough sculpture and make it truly a piece of art.

2. Ensure API exposure is consistent. This means that there are consistent names and overall structure to the system. I am doing very poorly on this point right now because I was learning the language while I was writing code. This is also where you separate public API's from internal APIs and document public api thoroughly.

3. Attempt to match the idioms of the language. I used underscores throughout my program because I like them and I didn't realize I wasn't supposed to; I will replace these with dashes because clojure uses dashes. You should also look at your usage of standard library functions and try to make sure you understand how they are supposed to be used. Remove any functions you wrote that are replacements for standard library functions; etc. Go through the clojure libraries and try to match the naming and design conventions of the major packages.

When I code, I do whatever is expedient because I want to see something work. But after I see something work, I want to make it really nice. The above steps are my standard steps and really they just facilitate me thinking about what I did in a very thorough but still abstract manner.

Chris