Bin packing and Scala

It’s not much – but here’s my first Scala app, built with scala and scala-js to provide a bespoke UI element:

So what’s it doing? I wanted to model how well a population of tasks of varying duration could fit onto N cores, to compare with the performance we saw in a production environment – something I’ve done before using a bin packing algorithm.

At the same time I was looking for a reason to write something in scala, partly to get a different perspective on the C# I’d been entrenched in.

Bin packing

Bin packing is a combinatorial NP-hard problem: how can we best fill bins size with items whose size varies? It has a large number of applications from filling boxes for transport to cutting stock in garment manufacture, but the application I’m interested in here is determining how best to spread a set of compute tasks across a number of cores.

Practically speaking best implies minimisation of either core count or some measure of overall duration. The environment I’m considering currently has a fixed number of cores with no (fiscal) cost benefit to letting some lie idle, so generally I’d like to minimize the average total duration for each core. There is a problem here: in practice duration distributions are roughly lognormal with a long long-duration tail, which tends to skew the spread across cores.

Suitable alternatives might be replacing the average bin fill with the maximum bin fill, or even the minimum bin fill.

Scala style

After a little time to familiarise myself with the syntax – I created a very procedural example that I then looked to refactor into something simpler – the bulk of my algorithm and supporting code looks like this:

At first look it’s very clean; in fact, it’s almost recognisably pseudocode. The shift from infix to postfix notation is a bit of a wrench, and I’m not entirely convinced that (items dequeue) _2 is simpler than items.dequeue._2, but I suspect that the problem with arguments of expressiveness ultimately come down to familiarity, at least in the short term.

Even this simple code brings up some questions. randomPopulation, for example, is stated without parentheses. This seems appropriate as there are no side effects – randomPopulation creates a random population of items that’s used to initialize the items val. Intellij still warns me about using an advanced language feature however, stating that it’s advisable to include the parentheses whenever the method call is more than a simple object.

This style is fine as far as it goes but I can’t help asking: how do we generally know whether the method we’re calling has side effects or not? Placing the responsibility – although it’s barely that, as the decision doesn’t actually contribute anything to the compiled codebase as far as I can tell – on the developer using a class doesn’t seem to make sense.

In the case of randomPopulation it seems to make sense, in any event, as there are no side effects beyond the appearance of a populated collection. I’ve also used (item dequeue) however, and there it would seem that I should use the parentheses: dequeueing an item is almost certainly changing the state of the queue, and I suspect that’s considered a side effect. The IDE, however, doesn’t differentiate between this and the randomPopulation example, leaving it up to the developer to determine whether he should be using parentheses or not.

Finally, there’s another subtle example in drawBinContent, which I’ve used without using the parentheses. This brings about an effect – data is written to the console. That brins about a state change in something else, so

Making a quick judgement on not much experience I’d say the brackets-indicating-a-side-effect are pretty much useless; I’d be willing to bet that it doesn’t get used much in actuall day-to-day development. I’ve left these examples as they are, choosing less on screen furniture in favour of pedantic explicitness.

A closure in C# and Scala

Moving away from signalling state changes, placeUsingAverageBestFit has an example of a closure. This’ll be familiar to C# developers; the closure captures the values of average and value. The definition of the lambda is a lot clearer under Scala, as I guess you’d hope.

Under the covers, it seems that the Scala compiler is automatically creating an appropriately parameterised method for the closure, without requiring as much boilerplate. That in itself is very welcome.

Roundup

Scala itself is quite clean but I’m not yet convinced about some of the style guidelines associated with it. If it’s considered important enough highlight side effects – and in a functional/immutable context like Scala it must be – then relying on developers to signal side effects with a stylistic tweak seems weak.

Development of this small app took very little time, but the end-to-end time – the time to become productive in a Scala environment – was enormous. To develop in Scala (especially to begin with) relies on sustained internet access to automatically fetch dependencies, which isn’t necessarily always available. There are also wrinkles in consistency that mean a lot of early development is mired in cargo cult approaches – at the moment, for example, I’m ensuring that the build tool (SBT) plugin I’m using has the same version of Scala that I’m actually developing in, and a dozen other simlar tweak. Frustrating. I’m also having to run Artifactory as a local package cache just to isolate me from intermittent connectivity.