Conventional Interfaces: map/reduce and friends

I've had a few queries about some of the code in my Hacker News bookmarklet - which makes use of some JavaScipt 1.6 & 1.8 features: map; reduce; filter; and forEach (sorry, no love for some or every this post). Superficially these are just array methods that take a function and apply it to each element in the array. More profoundly however, they provide the means to break up wildly different problems in similar ways - encouraging code reuse, providing flexibility, and (arguably) enhancing code clarity.

It does this by establishing an abstraction: creating a language that can be used to describe our requirements. If we think about (or re-think) our problems in terms of this language then it becomes easier to decompose and "mix and match" component parts as needed. It turns out that you can restructure a lot of your code using this simple language.

In Structure and Interpretation of Computer Programs (Go and read or watch SICP now - it's free. I didn't realise I knew nothing about programming until I was about halfway through this book) this concept is referred to as creating "conventional interfaces".

As an example, I've dissected a real world problem below. The problem is broken into exercises that can be run and modified on the spot. Each exercise includes some "things to try" - some small tests - which will hopefully reinforce the ideas and make everything nice and clear.

The problem

The HN bookmarklet runs on the Hacker News main page and refreshes the (normally static) view every few minutes. Changes in article rank, and votes are highlighted as to provide an overview of changes at a glance.

For this case study we'll concentrate on the piece of code that removes the indicators for "runs of rising articles": after the new HN page is loaded and the rise and fall indicators have been added, I take all of the "rises" - the green circles that indicate a story has risen - and look for runs of same-valued rises. If a run (more than 2) articles are found, the indicators fade away. The reason is that a run is (probably) caused by some other story nose-diving, therefore every other article above it gets a bump - and the page has a long trails of green dots on the page that aren't very informative.

An answer

My first solution was to just loop over each rising element using nested loops. If the stories rank was 1 above the previous, and the change the same as the previous, then it was added to a temporary array. When the order was broken then temporary array was iterated and all the contained elements were removed from the page. Then the process was started again until all elements were checked.

It seemed to work, but I started to wonder if jQuery's find method always returned the elements in the same order as the DOM nodes on page. I couldn't find anything that confirmed or denied it, so I decided it would be wise to sort the articles by rank before I did the run checks. Throwing a sort into the mix really uglified my code (and I wasn't liking that local state variable anyway) - so a new strategy was formulated. Map/reduce!

The beauty of this approach is that once you've decomposed the problem in the correct terms, you can easily start mashing things together. If you suddenly need an extra sort, or and additional check, then it's just an extra black box in the middle of a chain of operations. Here's how we'll tackle our problem in map/reduce language: Read the following code as a paragraph, substituting any periods with the word "then":

$("getAllOfTheGreenCircles")
    .map( the DOM elements in to nice JavaScript objects )
    .sort( them in order of rank )
    .reduce( the elements into arrays of runs )
    .filter( out all of the long runs )
    .reduce( the long runs into a one dimensional array )
    .forEach( element in the array, remove it )

That's the plan - we just need to write some functions in place of the english descriptions.

Each step will be explained below with some runnable code. Once we've completed each step (for example, after we cover the first "map") then in subsequent examples the code will be replaced with a named function - so it's nice and short and we can concentrate on the proceeding step. The named functions are editable way down the bottom of this page, so once you've done the exercises you can modify and mix & match them.

Because our problem is based on "real" code, I'll use markup from the Hacker News bookmarklet. The bookmarklet injects elements in front of Hacker News articles that have changed. The indicators are green or pink - green is a rising article, pink is a sinking one. The number shows how many rank position it has moved. Next to the indicator is the actual rank of the article at the moment.

For this case study, we'll be concerned with the small green circles indicating that an article has risen, and it's corresponding rank.

11. Show HN: A particle accelerator I made on the weekend

12. One week live, $5bn profit

13. Who moved my comment votes?

34. The new bubble is peppermint flavoured

15. TechCrunch just posted something

26. Monetizing your interpersonal relationships

27. Ask HN: Does my bum look big in this?

28. Mr Speaker spamming more crapy blog posts

Step 1: Enumerating with jQuery

Before we can start mashing up our list elements we need a list. All of the green "rises" have the class name hnu-up and we can fetch them with jQuery. Because jQuery returns a jQuery list, we have to get the real underlying array by calling .get() on the result.

Click the "run" button below to execute the code. You can modify the code to see your changes in action - but I haven't put too much error checking in place so it will be quite easy to smash this page, and you'll need to refresh.

$(".hnu-up").get()

Nothing very interesting about step 1, but we can see how the code runner works: the code is read and evaluated, and the results are displayed below. In the output, you can click on the output line numbers to see which DOM objects in the article list have been affected. For each exercise that follows, the article list will follow you down the page.

Things to try

  • Remove the call to .get() and see the difference. The elements are the same, but on is wrapped in a jQuery object. If we don't have a real JavaScript array then we won't be using the native JavaScript functions.

Also, for those brave souls not using jQuery - if you try selecting the spans with document.querySelectorAll then you'll receive a "nodelist", not an array. To get a real array you need to slice the results: Array.prototype.slice.call(nodelist).

Step 2: Mapping with map

The Map function operates on an array - running each element of the array through a function that you define. The return value of the function becomes the new element in the list. We say that the function maps elements to their new value. For an example, if we wanted to map an array of integers to a new array where each element was tripled:

[1, 2, 3, 4].map(function(el){ return el * 3; });

gives a new array with values: [3, 6, 9, 12]. We only used the first parameter (el) above, but the callback function actually gets given 3 parameters for each element: the array element itself; the element's index in the array (zero indexed, of course), and the entire array (for some reason. Well, actually good reasons, but you won't need this too often to begin with).

Our problem description was to map( the DOM elements in to nice JavaScript objects). In step 1 we got the array of DOM elements, and now we want to extract the juicy data values it contains. For each element we'll return an object literal with the element's "rise", its rank, and a jQuery wrapped object so we can manipulate it easily later on.

$(".hnu-up") .get() .map(function(el) { var rank = el.nextSibling.textContent; return { rise: parseInt($(el).text(), 10), rank: parseInt(rank.replace(/[^0-9]/g,""), 10), $: $(el) }; });

This code is a bit weird looking - as we had to do some "DOM level" stuff with the nextSibling and textContent. This is because jQuery doesn't really do text nodes - but Hacker News' markup required it. Because I wanted to keep this case study real I didn't over-simplify it.

The el item our callback receives is the circle DOM node from jQuery - so to get the "rise" we just grab its text value and parse it to an integer. The rank is similar, but requires a lil' bit of regex-ing to get rid of phantom quotes that appear when you manipulate whitespace elements.

The result of map will always be a new array with the same length as the original. The items might be exactly the same as the input array, or like our example, might be completely different. Because the result is an array, it can then be further manipulated by array methods - such as another map!

jQuery contains a couple of it's own map functions that work similarly (though not identical) to the JavaScript version. If you're interested, read of my comparison of jQuery vs jQuery vs JavaScript.

Things to try

  • Replace the ".hnu-up" elements with the down elements: $(".hnu-down").
  • Remove the final semi-colon and add another map function that doubles each item's rank.
  • Instead of our funky object, return a simple array of "rank" integers [1,2,3,5,6,7].

Step 3: Sorting with sort

Next up, we need to sort( them in order of rank ). Sorting has been in JavaScript forever so we won't spend too much time here. If you call sort() on an array it will try and sort the elements alphabetically or numerically but you can also specify your own comparer function.

The function compares each item "a" with each other item "b". If the return value is less than 0, then "a" comes before "b". If it's greater than 0 then "b" comes before "a". Otherwise, they are equal in sort importance. We want to sort our list in order of rank, so we subtract the "a" rank from the "b": a.rank - b.rank.

$(".hnu-up") .get() .map(extractValuesFromElement) .sort(function(a, b) { return a.rank - b.rank; });

Note that I've replaced the anonymous map function we wrote in the last example with a named function "extractValuesFromElement" - it's the same code, but naming it makes these un-syntax-highlighted textboxes much easier to decipher! The code is defined and editable below in the "named functions" section.

Things to try

  • Sort the elements in descending rank order.
  • Sort the elements by "rise".

Step 4: Reducing with reduce

Ok, the big one. Reduce can seem a bit weird at first - especially because it's a bit of a "swiss army knife" whose function can't be summed up as easily as map, or filter. Reduce is sometimes called "fold", or "compress" (or lots of other names) which suggest that it makes arrays smaller in length - and they usually do. The purpose is build up a single result based on the values of all the individual elements. The most common example is adding all of the elements of an integer array:

[1,2,3,4].reduce(function(accumulator, element) {
    return accumulator + element;
}, 0);

Returns 10. I find reduce most easy to use when I name the parameters acc (for the accumulator) and el (for the element). The accumulator is passed along as to each element so you can keep track of what's going on. In the example above we gave it the initial value of 0, and then each element got added to this running total.

But the accumulator does not have to be an integer. Reduce is often used for building up "lists of lists". We need this functionality to achieve our step 4 goal to reduce( the elements into arrays of runs ).

Our input is the sorted list of all elements, and our output will be a list of "list of runs". We'll use reduce to group sequential elements with the same rise value. This is going to involve a bit of array manipulation. We start with an empty list that we pass as the accumulator. For each element, we get the value of the last run, or create a new empty list.

If the element is "in order" then we append it to the existing list, otherwise we add it to a new list that contains just itself. The test for "in order" is a small function that checks that the new rank is one more than the previous element's, and the rise is the same.

$(".hnu-up") .get() .map(extractValuesFromElement) .sort(sortByRank) .reduce(function(acc, el) { var head = acc.length ? acc[acc.length - 1] : [], isSeq = function(prev, el){ return el.rise === prev.rise && el.rank === prev.rank + 1; }; if(head.length && isSeq(head[head.length - 1], el)) { head.push(el); } else { acc.push([el]); } return acc; }, []);

The output shows we now have reduced our array to 3 elements. Each element is an array of items that are in sequentially ranked with the same rise. If you click on the "0:" or "2:" you can see the grouping is correct.

Reducing is used to tally or group - so if you find yourself looking for totals, or trying to push elements together - reduce is your man. If remember the parameter names "accumulator and element" then things will be clearer when you look at examples that unhelpfully label them "a and b"!

Things to try

  • Modify the internal isSeq function to only check that the rise is the same (remove the "rank" clause).
  • Modify it to only check that the rank is sequential, and don't check for "rise".

Step 5: Filtering with filter

Now we're cooking. We have an array of arrays containing elements that we potentially want to remove. 2 out of the 3 arrays have more than one item in them - these are the ones we are interested in. To get them we need to filter( out all of the long runs ). Filter is very similar in structure and use to map, but instead of modifying elements, we return a boolean value that indicates if we wish to keep the element in the new array. Our output array will therefore either be the same length (if everything matches the test), or shorter.

Our definition of a "long run" is a sequence of 2 or more items. Therefore, we test the element and return true if it's length is greater than 1.

$(".hnu-up") .get() .map(extractValuesFromElement) .sort(sortByRank) .reduce(groupIntoRuns, []) .filter(function(el) { return el.length > 1; });

After applying our filter, we are left with arrays of long runs. Like map, we also are given the array index and complete array if we need it. Index is particularly useful for filtering, as you we do things like return index < 3 to get the first 3 elements, or return index % 2 === 1 to leave only every other element.

Like map, jQuery has it's own filter method for filtering lists of jQuery objects. It can work the same as the JavaScript version but more commonly it's used to filter based on a CSS selector string.

Things to try

  • Return only elements that have runs of 2 or less.
  • Go back to exercise 1 and select every other (or every 3rd) element.

Step 6: Flattening with... reduce?

We're still not quite there yet. Although we know all of the elements we want to remove, they are stored inside arrays. We could do a nested loop and delete them, but that's looking pretty ugly to us now. A cleaner way is to reduce( the long runs into a one dimensional array ). Ok, obviously I'm saying we need to use reduce, but in many functional languages "reducing an array of arrays to an one dimensional array" has a name: it's called flatten.

Thankfully flatten is very easy to write with reduce (and in most languages it actually is). Our accumulator starts as an empty list, and for each element (each array) we just concatenate it to the accumulator.

$(".hnu-up") .get() .map(extractValuesFromElement) .sort(sortByRank) .reduce(groupIntoRuns, []) .filter(returnAnyLongRuns) .reduce(function(acc, el) { return acc.concat(el); },[]);

The result: a flat array containing the five elements that have gots to go. The flatten-via-reduce is quite idiomatic, so be sure to commit it to memory.

Step 7: Iterating with forEach

All that's left do now is eradicate our green circles. The approach is simple: forEach( element in the array, remove it ). The forEach method applies a given function to every element in the collection. It's like a more beautiful for loop.

$(".hnu-up") .get() .map(extractValuesFromElement) .sort(sortByRank) .reduce(groupIntoRuns, []) .filter(returnAnyLongRuns) .reduce(flattenRuns,[]) .forEach(function(el) { el.$ .fadeOut("slow") .delay(2000) .fadeIn(); });

Luckily, way back in exercise 2 we had the forethought to keep a reference to the DOM element's jQuery selector just for this moment. We fade it out slowly so the user sees what we are doing. Then, because it's just an example, we bring it back so you can keep on playing.

Things to try

  • Replace the anonymous function with the named removeIndicator function (listed below).
  • Re-write all of your code you've ever written with map/reduce and friends.

Named Functions

Nothing to see here: these are the named functions we've been substituting above to keep code nice and short. If you make changes to the functions be sure to hit the "update functions" button to update them (they're just global functions).

// Functions for examples extractValuesFromElement = function(el) { var rank = el.nextSibling.textContent; return { rise: parseInt($(el).text(), 10), rank: parseInt(rank.replace(/[^0-9]/g,""), 10), $: $(el) }; }; sortByRank = function(a, b) { return a.rank - b.rank; }; groupIntoRuns = function(acc, el) { var head = acc.length ? acc[acc.length - 1] : [], isSeq = function(prev, el){ return el.rise === prev.rise && el.rank === prev.rank + 1; }; if(head.length && isSeq(head[head.length - 1], el)) { head.push(el); } else { acc.push([el]); } return acc; }; returnAnyLongRuns = function (el) { return el.length > 1; }; flattenRuns = function (acc, el) { return acc.concat(el); }; removeIndicator = function (el) { el.$ .fadeOut("slow") .delay(2000) .fadeIn(); };

In conclusion...

I hope this case study hints at the power of establishing conventional interfaces. Breaking up your tasks into black box components can be a fantastic way to reduce complexity - making it easier to spot issues or highlight possible improvements. But, be warned... these things can become addictive - and you are always left feeling that there might be an even cleaner, more concise way to do things: because there always is!

4 Comments

  1. Erik wrote:

    You are a very good explainer. I read all sorts of programming blogs and find myself zoning out fairly often but I was able to go through this entire exercise without having any “huh?” moments or my eyes glazing over. Well done.

    Tuesday, May 3, 2011 at 3:28 am | Permalink
  2. DCoder wrote:

    You can replace step 6 with
    reduce(function(acc, el) {
    return jQuery.merge(acc, el);
    }, $())

    to get a single jQuery object containing all the elements. Then step 7 only runs the fading once and jQuery internals can do the iteration over the set. Among other things, that means only one .delay call, which results in only one setTimeout call.

    (My first thought was simply acc.add(el), but that builds a chain of prevObject… merge doesn’t.)

    The same pattern can be used to generate select options or list items from data arrays swiftly:

    /* suppose objects are all of the form {
    ID: number,
    Name: string,
    moreAttr: moreVal...
    } */
    var objects = [obj1, obj2, obj3];
    var options = objects.reduce(function(acc, el) {
     var opt = $('', {
      value: el.ID,
      text: el.Name
     });
     return jQuery.merge(acc, opt);
    }, $());

    $('select#foo').html(options);

    Tuesday, May 3, 2011 at 4:31 am | Permalink
  3. vjeux wrote:

    Nice use of functional constructs :)

    Just a little detail. When you use named function:

    name = function () { }

    you clobber the global namespace, you probably want to do

    var name = function () { }

    which is the nearest equivalent as

    function name() { }
    Wednesday, May 4, 2011 at 1:33 am | Permalink
  4. Mr Speaker wrote:

    Hey Vjeux – I had to make them global for my crappy “code runner” to work properly so the user can update the functions (actually, before I had 2 copies of the functions – and they seemed to get executed in a different scope. Now I just have one, so I’ll try again.)

    Thursday, May 12, 2011 at 1:01 am | Permalink

Post a Comment

Captcha! Please type 'radical' here: *
How did you find this thingo? *