How to join an open source project

I’ve been wanting to sharpen my edge against an open source project recently. Specifically, my C++ edge.

As such, I’ve started working in on libreoffice. How I decided on libreoffice? I wanted a C++ project, and knew it was written in C++.

So, how did I get started? I went to the libreoffice website and there was a big link to the developers’ section on the front page. I followed the instructions there, downloaded the code, followed the instructions for building it, and then started looking for the bug list. Libreoffice has, very awesomely, a section on their developers’ site called Easy Hacks, which is where I started.

I assigned a bug from bugzilla to myself, and now I’m setting myself up with a bit of a unit-testing framework. We’ll see how it goes…

Calm Week

Nice week this week. I started out with an incredibly large and neat refactoring; it looked like it was gaining in size and complexity until a colleague reminded me to slow down, atomic commits and assign to each commit a specific purpose. It meant I stopped, took a few minutes out and worked out what the order and purpose of each commit was.

When I did that, it became incredibly clear what my order of operations should be, and it also became clear just how much extra value small commits have. I was committing bug fixes that were immediately available, instead of making coworkers wait for a code dump. It’s always good to be reminded of self-discipline.

I spent a bunch of night time working through the const keyword and its relationship to pointers. Here were the scenarios I wanted to understand:

const foo* bar = bam;
foo* const bar = bam;
const foo* const bar = bam;

Once I could conceptualise the idea that sometimes you want to track a constant address and sometimes you want to track constant data, and this is separate from using a variable to track an address and data because a variable actually encapsulates the memory footprint of the object as a whole, it all fell into place.

Beyond that, then, I watched a lot of Avatar this week. My girlfriend and I have really been into it; it’s such a cool little show. I’m really impressed by how consistent the writers have managed to keep it; it tracks its continuity really well, and is very consistent in how it portrays the power levels of the characters. Last night, we saw an episode from season 2 in which short stories are told about each character, and one of them has a story about how he’s sad that it’s X years to the day since his son died. It was pretty heart wrenching; I think both of us had tears pouring down our cheeks afterwards.

It’s weird how I cry far more easily the older I get. I’ve talked to a few friends about it, and they seem to be undergoing the same thing. I wonder what the deal is with that.

Yesterday, I had the task of wiping out a date stamp from a binary file format. I used a hex diffing tool to compare two versions of the same file to see what the offset was for the date, and then slowly nibbled away at an in-place edit using Perl. Normally, I use this as a template for my Perl in-place edits:


do
{
local $^I = '.bak'; //extension for an automagic backup file
local @ARGV = "somefile.txt";

while()
{
s/YEAR/$year/;
print;
}
};

However, yesterday this wasn’t adequate because I lacked knowledge of what byte patterns to throw in a regex. The more I thought about it, the more I was edging towards using read to slurp in the first X bytes, then spew them out into a temp file, spew out 16 bytes, then read from X+16 until EOF and spew that out afterwards.

At that point, it was clear that I should be doing an in-place edit still, but I didn’t understand my little template code correctly. So I sat down and wrote a few different example programs, working out the file handle relationships for in place editing. Eventually, it left me with something like:


open my $FILEHANDLE, "+< ${file}";
seek($FILEHANDLE, $offset, 0);
print $FILEHANDLE "0000000000000000";
close($FILEHANDLE);

And I felt like I’d achieved zen in filehandles.

release cycles

I spent a lot of time yesterday thinking about the release model of “one major release, one minor release a month”, and about how websites have traditionally been on a “release ten times a day, who cares?” schedule. I was wondering if there was any proof one was bettar than the other.

And it crossed my mind that Google, traditionally a web comapny, stepped into the “we release binaries” market a long time ago, but with Chrome they stepped up their game bigtime: Chrome seems to release new versions so quickly I’ve never been able to keep track of the version numbers. Indeedle, the version numbers in Chrome don’t really mean much at all. Firefox is now on the same kind of release model.

I’m wondering if this is like distributed source control; we’ve just seen the first couple of big products to adopt a completely different way of viewing a traditional process, but it’ll take the rest of the world a few years to notice that it’s more than a flash in the pan.

Ah, the joys of learning something new

So, I’m going through the MS recommended “C++ Beginners Guide”. I have to keep reminding myself that it’s useful to type this stuff out, even though it’s way, way simpler than anything I’ve written in years.

It’s neat coming to a guide like this already understanding OOP pretty well via Ruby; it’s cool seeing what the guide doesn’t address. I often think that reading real beginners guides is good for a programmer’s soul: it lets you remember where you came from, lets you focus on where you are now. I often think that one of the worst things we can do for ourselves is forget where we came from: how do I know what direction I’m headed in if I can’t see where I came from and how I got here?

Of course, there’s also the idea that I need to do something with this knowledge, so I’ve set myself a challenge: I normally install cygwin and use its outstanding ports of a bunch of unix tools to help me remain sane in windows. My challenge to myself is that I can’t install cygwin, and if there’s a program I absolutely must have then I have to write it in C++ myself. That should give me a good solid heaping spoonful of real world experience to continue forwards with my ideas for developing on Windows.

I’m also restricting myself to Visual Studio Express, though there are some libraries it doesn’t provide that’d be nice for me to have.

refactoring

So, my little company’s been writing a lot of perl recently, and it’s been good to be reminded how cool the language is, while also being a bit annoyed by some of its limits.

The big thing I’ve been picking up on, though, is how easy it is to violate separations of concerns. And once you start, it’s hard to solve.

As such, I’ve been looking at refactoring browsers more. There’s a decent perl one, padre, but I’d like the power that comes with something more, like Eclipse with Java or C# and Visual Studio. Extract method alone would make this stuff so much simpler.

Between a refactoring browser and a good suite of tests, of course, code becomes a beautiful and flexible thing.

Trying to Improve

I’ve been writing a lot of perl recently, and had a few solid insights into what it means to program better than a beginner. Not like a professional, but not like a beginner.

Writing to the interface has been huge to me. Writing to the interface, not changing the interface.

 

Slowly working out how to keep my functions free of side effects is another. Incredibly hard to do, it’s taken me years to grok what it means for a function to have a side effect and to discipline myself to write side effect free code. I’m not great at it, but I’m better than okay.

A really interesting way to learn about side effect free code is to start writing ruby. The ruby community really encourages tons of small methods on an object, and that starts training your intuition to understand how to make one method do only one thing, and chain things together.

 

I have so much improvement to do, though. I don’t feel like I really understand perl or ruby or C or Objective C at all. C, I’m getting closer with: understanding how pointers and the address of operator work was crucial, as was getting a mental image of what a type is in C (think of it as a way of indicating how much memory something needs).

 

On another topic, Uncle Bob’s been giving some really interesting talks recently about what the new programming paradigm might include, what direction programming should be taking. His recent interview on the pragmatic programmer’s podcast is well worth checking out.