Tim Barrass bio photo

Tim Barrass

Freelance C# and distributed systems developer.

Email Twitter LinkedIn Github Stackoverflow

Simple things today. It’s time to stop forging ahead with functionality and take a look at making sure the system’s as usable as it could be. First stop: the classes at the front line. redis.workflow has a couple of very closely related classes there – Workflow and Task.

Currently both are instantiated independently by the client, which is flexible but prone to problems: the Task needs to be configured with its Workflow ID, for example, and setting that is the repsonsibility of the client. Similarly, if a Task has no parents or children then it’s the job of the client to instantiate empty arrays for the Task’s lookup. It’s clear that these are simple structures that grew from needs developed during test and development, but which are now leaking information about implementation.

Now it’s stabilised somewhat it’s time to take a look at Workflow and Task from a different viewpoint. Workflow has taken on a specific role: that of the aggregate at the edge of the workflow management bounded context. Task has leaked out into the client context and is used both to configure the workflow and as the structure the Workflow uses to record task data. Not only are we currently leaking information about the workflow context, we’re not protecting the context from crazed clients with destruction in mind, or, for that matter, helping the well-intentioned to do the right thing.

I’d like to look at these two classes in the context of an aggregate in the Domain Driven Design sense and use that to craft them into something more appropriate – and improve the client experience in the process. The first thing to do is to look at how and why these classes are leaking information.

Stop leaking

First I want to stop Workflow and Task from leaking information. Currently, they’re simple C# projections of the data structure in Redis, and as a result involve the client in processes that should be firmly on the other side of the context boundary. Let’s take a look at their state now:

public class Workflow
{
    public string Name { get; set; }

    public IEnumerable<Task> Tasks {get; set; }

    public string ToJson()
    {
        return JsonConvert.SerializeObject(this);
    }
}
public class Task
{
    public string Name { get; set; }

    public string[] Parents { get; set; }

    public string[] Children { get; set; }

    public string Payload { get; set; }

    public string Workflow { get; set; }

    public string Type { get; set; }

    public int Priority { get; set; }

    public Task()
    {
        Type = "";

        Priority = 1;
    }
}

Despite these being simple classes there’s a lot going on here. First of all Task shouldn’t be exposing string arrays because that exposes them to modification by the client. That’s not the intended pattern of use for this class, so I can prevent it by instead returning an IEnumerable<string>. It’s arguable that Task shouldn’t require the client to initialize them either, but in fact that’s missing a subtlety in the contrast between setting the value and just using some default. In the latter case, I could provide some mechanism for allowing the client to not explicitly set parent and/or child lists, if there are no parents or children, rather than forcing them to pass in an empty list themselves.

Task.Workflow is an implementation detail. The client doesn’t need to ask a Task which Workflow it belongs to. Furthermore, there’s no reason why it should be set by the client when it could be set when added to a Workflow – having the client manage that consistency is unnecessary. Similarly, Workflow.ToJson doesn’t need to be public – it’s geared specifically toward serialziing a workflow to hand over to Redis. It can also be marked internal.

I might have the Workflow set it when the Task is added for the first time, but that means updating to state of the task. I could make it pretty by isolating that state change in an internal setter, but there’s a cleaner way. I’ll make Workflow responsible for instantiating Tasks by adding an AddTask constructor method. Workflow can then configure the Task completely at instantiation and add it to its known list of Tasks at the same time. I’ll also make Task.Workflow internal, emphasising that it’s not intended for client use.

Task.Name is an interesting case as that needs to be unique and consistent across all Tasks in a Workflow, as that’s how the parent and child lists are linked to actual Tasks. This coordination has direct consequences on the style of interface I create, as relationships between instances might be either explicit (declared) or implicit (discovered).

Declaring or discovering relationships

Currently this interface relies on the client explicitly declaring all the relationships between tasks: when instantiating a Task the client needs to populate lists of parent and child task names. The alternative is to have the Tasks or the Workflow form them itself, presenting an interface something lke:

var workflow = new Task("Name1", "payload1").AddChild("Name2", "payload2");

Each has its advantages and disadvantages. Explicit declaration is prone to problems of consistency; as the client is responsible for ensuring that tasks are named consistently, and that the right names are used in parent and child lists, there’s a risk of inconsistency creeping in. The interface is, however, flexible, and can be used in the same way for simple and complex workflows.

The implicit approach removes the burden of consistency from the client, at least to some extent: the client could still forget to add children, for example. It begins to break down for complex workflows, however – effectively forcing the client to walk the workflow graph to build up the workflow, which is arguably unnecessary. It’s also hard to envision what the interface should look like for tasks with multiple parents.

I’ve decided to stick with the explicit method. The burden of consistency is still on the client, but I can help. Simple consistency checks on PushWorkflow would slow things down a little (and increasingly so with workflows of increasing size) but would at least catch problems at early runtime rather than late.

The checks I’m considering aren’t particularly complicated; I just want to make it possible to help the client avoid shooting themselves in the foot later. At this point I don’t want to check for cycles in the workflow, for example, although that’s the kind of thing I could warn about. Instead, I just want to check that everything’s configured consistently – so that if a task states that “A” is its parent, then there should be, at push, a task named A. In the spirit of fair play I’m also going to add a toggle that would allow the client to disable these checks if they’re certain their workflows are configured correctly, or if they’re happy catching these things down the line. What I’ll do is build up a list of defined task names, and a list of parents and children requested, and compare those sets at push. If there are any discrepancies then I’ll throw a suitable exception and not push the workflow. A special case check that I’m going to run whether the client asks for it is not is that each task is named uniquely.

PushWorkflow then becomes a bit simpler, because I can extract the root parent check and give it to Workflow:

public long? PushWorkflow(Workflow workflow, bool checkConsistency = true)
{
    if (checkConsistency) workflow.VerifyInternalConsistency();

    var json = workflow.ToJson();

    var workflowId = _lua.PushWorkflow(_db, json, Timestamp());

    return workflowId.Value;
}
//_

and Workflow gains responsibility for checking its own consistency, which is as it should be:

internal void VerifyInternalConsistency()
{
    var undeclaredChildren = _children.Except(_defined).Select(t => t.ToString()).ToArray();

    if(undeclaredChildren.Length > 0)
    {
        throw new WorkflowException(string.Format("Some tasks defined undeclared tasks as their children: '{0}'", string.Join(",", undeclaredChildren)));
    }

    var undeclaredParents = _children.Except(_defined).Select(t => t.ToString()).ToArray();

    if (undeclaredParents.Length > 0)
    {
        throw new WorkflowException(string.Format("Some tasks defined undeclared tasks as their parents: '{0}'", string.Join(",", undeclaredParents)));
    }

    var rootParentCount = 0;
    foreach (var task in Tasks)
    {
        if (task.Parents.Count() == 0)
        {
            rootParentCount++;
        }
    }

    if (rootParentCount == 0)
    {
        throw new WorkflowException("A workflow must have at least one task with no parents.");
    }
}

//_

There’s a subtlety around the way I’ve organised for the consistency check to be triggered. It occurred to me that it’s over-complicated: why get the user to pass in a workflow and a parameter to toggle the internal consistency check? I could just make VerifyInternalConsistency public and let the client call it if that’s what they want to do. I’ve left it as it is because this way I can default to checking the consistency automatically, and allow the client to turn it off.

So at the end of this process I’ve reduced the number of classes exposed to the Client to just Workflow and WorkflowManagement, and the client’s less exposed to delayed awareness of misconfigured workflows. We’re still talking about everything in terms of raw strings and ints however. That means two things: first, the client is exposed to a risk of applying the right setting in the wrong place; and second, all logic associated

Protect with a rich type language

Switching to use a richer type language can also simplify code and help catch errors earlier. Consider this early signature of AddTask:

workflow.AddTask("TestNode1", "Node1", "", 1, new string[0], new string[] { "TestNode2" });

It’s basically a set of strings with only the argument name (but; isn’t this why we give them names?) to differentiate them:

public void AddTask(string name, string payload, string type, string priority, IEnumerable<string> parents, IEnumerable<string> children)

Instead, in some contexts it makes sense to wrap each of these strings in a class, forming a low level type language that the context uses externally to describe the fine grained objects at its boundary. An example:

public class TaskName
{
    public TaskName(string name)
    {
        _name = name;
    }

    public override string ToString()
    {
        return _name;
    }

    private readonly string _name;

    //_
}

Yes, you still need to specify the name as a string when you instantiate the class. But as a result, you get clear method signatures like

public void AddTask(TaskName name, Payload payload, TaskType type, TaskPriority priority, IEnumerable<TaskName> parents, IEnumerable<TaskName> children)

that the compiler can help you check; the poor thing’s a bit lost if you just present it a series of undifferentiated strings. It has to assume you know what you’re doing.

Additionally, this gives us a chance to isolate any code associated with checking the coherency or suitability of an input parameter as the configuration for a TaskName. If it can’t be null, then we check that it isn’t, here, rather than in every method that they’re used in [* note yes, this is a bad example. We probably still need to check for null when we’re passing in a TaskName]

So what we’re doing is attempting to increase transparency at the level of use (by allowing the compiler to check for us) by taking a complexity hit at a finer level, at the level of the types we use at the context boundary.

Having those richer fine-grained types is also often associated with (but not essential for) another useful technique which comes under the same “richer type language” umbrella: we can define standard higher types based on common contexts. For example: a handy empty list of TaskNames, which is useful if you have a Task with no parents. Or children, for that matter:

private readonly EmptyTaskList = new TaskName[0];

As I mentioned though, moving to this richer type language comes at a cost. It increases transparency at one level while accepting a hit of complexity at another: and having more classes to maintain isn’t the only cost.

The Json serializer can no longer just create the object structure we expect to see at the internal Lua interface because of these rich types. The serializer naturally creates a whole TaskName structure in any Json created, rather than just the string expected by the Json processing in Lua.

There are two options: deal with this in C#, or deal with this in Lua. I’d rather take the hit in C# and keep the structures I deal with in Lua simple. Interestingly (inevitably) the very feature of Workflow and Task that caused me to go down this route – the fact that they reveal the implementation detail underneath – can guide me here. What I need to do is make the original Workflow and Task definitions the internal representation of these things. However, I don’t need to recreate or retain the original types; I just need to make sure I pass a type with the right named parameters to the serializer. I can do that with an anonymous class on the fly:

public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
{
    var task = value as Task;

    var obj = new {
        WorkflowId = task.Workflow.ToString(),
        Name = task.Name.ToString(),
        Payload = task.Payload.ToString(),
        Priority = task.Priority.ToString(),
        Type = task.Type.ToString(),
        Parents = task.Parents.Select(t => t.ToString()),
        Children = task.Children.Select(t => t.ToString())
    };

    serializer.Serialize(writer, obj);
}

At this point I’ve created a richer type language for workflow management that gives the client some compile-time guidance about the way they’re using redis.workflow. It’s arguable, however, that this might be a step too far.

Isn’t this massively over-engineered?

Potentially. I’ve mentioned several times that in some contexts you might want to do this, or do that. In some contexts developers are disciplined and/or their process is able to take the risk of delays caused by incorrect method arguments. They might be naturally cautious and sustainably safe, or they might be lucky that their tests capture many of these problems.

It’s not all down to the developers though. As systems grow in complexity the burden of boilerplate code checking, and the maintenance burden associated with looking at piece of code an being able to assume that the method is being called correctly can increase to an extent where this approach brings real benefits. In my experience almost every successful application reaches that level of complexity at some point.

My feeling is that yes, in this particular example this borders on over-engineering. We can take a look at the result and work out why that is, and extract a couple of signs or smells from the process. The wrapper classes we’ve created are dumb shells. There’s no existing constraint logic that we’ve been able to extract from service methods and handle in the rich types instead. These two points could be evaluated for a codebase before generation of a rich type language, and used to determine whether the payoff is worth it. Of course, there’s an argument that this is a good foundation for possible future changes but, there’s a whole other princple that deals with that ..

Finally, it’s still not a guard against some types of misconfiguration. If you’re taking in a string in your TaskName constructor and someone decides to configure it with the name of your workflow, then – well you can’t guard against that. Some types are more amenable: you might present a limited range of available priorities, or acceptable task types, for example, but this kind of consistency error is higher level than what’s discussed here, and more application specific. This is basically an exercise in identifying opportunities for tackling specific problems at a specific boundary in the application, and causing problems unrelated to activity at the boundary away from it.