Open Source Basics: NPM Edition

Sep 22, 2016

3 minute read

As software developers, we’ve long used third-party code in our day-to-day work, but these days, it’s much easier to find and integrate it with package managers and searchable repositories.

Inevitably, there comes a time when our unique use of a library exposes a new bug, or we find that we could almost use that sweet tool if only it did this one tiny thing differently. When that happens, we find ourselves popping open the hood and making changes to a third-party dependency.

The same modern cushy systems also make it easier to maintain these changes, collaborate, and contribute our changes upstream. This is what I’m going to talk about today.

fork

I’ll use NodeJS’s npm in this example, but the process is similar for other languages' packaging systems like RubyGems or PyPI.

So we’ve decided to make a change to a library. Say we’re using the npm package foo, referenced in our application’s package.json file like this:

"devDependencies": {
    "foo": "1.2.3",
    ...

The first step, of course, is to clone the repository. Make sure to check out the same revision that your application is currently using. (It’s probably a recent release, not trunk.)

With npm, we can reference our local copy like this:

"foo": "file:/Users/johnruble/repos/foo",

This is a memorable but somewhat blunt approach, with a couple of caveats:

file:/ sources do not know about Git. They’re just looking at what’s on disk, so don’t try to reference a specific branch or revision.
This path is simply a source we can install from. To pick up changes, we’ll need to re-npm install and rebuild our app. If you find yourself doing this repeatedly, look into npm link.

Now that we can build our app using our own custom version of the third-party component, we’re ready to dive in.

Eventually, our experiment is successful. We’ve made changes, and we want to use them in development (and eventually production) builds of our app. After pushing our branch to another remote where it can live for a while, we can reference our repository in our app’s package.json, so that it can be reached by other developers, CI, and deployment:

"foo": "jrr/foo#branch-with-my-changes", //(github shorthand), or
"foo": "git://private.repo.com/jrr/foo.git#branch-with-my-changes",

Keeping a separate fork allows us to keep moving forward for now, but eventually, we’ll probably want to…

On the flip side, there are several advantages to making our fork obsolete by contributing the work upstream:

So, we’ve decided to submit our changes upstream. How do we do it?

We’ve been working from a tagged release of the library, but changes are typically made on a develop or master branch. Merge the latest code from upstream into your branch (or better yet, rebase onto it).
Run the library’s tests to make sure it’s still behaving correctly.
Use this updated version of the the library in your app, and run your app’s tests to make sure the library is still behaving the way you want.
Clean up your branch (squash commits, remove commented code, etc.). It may be easier to just check out a new branch from master and apply all your changes to it in one commit.
Write tests! This is critical since we’re working with code that 1) we depend on, and 2) is not under our control. In particular, write tests to specify and document the behavior we need, and to defend our changes against accidental regression in the future.

"foo": "github:user/foo.git#3f25967e",

Finally, when our changes are released with the library’s next version, we can switch back to vanilla upstream:

"devDependencies": {
     "foo": "1.2.4",

It feels good to remove the lingering risk that the fork represented for our project, and also to know that other developers are using our code!

This post originally appeared on Atomic Spin and may contain input and edits from some of my colleagues.

JRR