At first sight, the terms of reference (TOR) looked exciting: an ambitious, nationwide programme that would have to be asessed for its replicability in other contexts. But our level of excitement dropped dramatically when we studied the TOR in more detail. Although all DAC criteria were listed – relevance, effectiveness, efficiency, impact, sustainability -, the key questions under those headings seemed suprisingly modest. They focused chiefly on programme process and results among direct programme stakeholders, i.e. the non-governmental organisations that had received grants and free training under the programme. That is, the evaluators would ask those who have obtained those goodies whether they felt the programme was effective.
The reputed international accounting firm that has run the initiative (and drafted the TOR?) should know that a certain amount of bias might cloud the judgement of people who have drawn such immediate benefits from the programme.
No mention whatsoever, in the TOR, of the ultimate beneficiaries – the citizens of that country who are expected to enjoy more responsive and accountable governance as a result of the programme. Although the programme has been going on for several years, the only evaluation question about impact is fairly abstract; inviting the evaluators to speculate about the extent to which the outcomes achieved might contribute to longer-term changes.
If the evaluation is supposed to test the replicability of the initiative, it would seem important to scrutinise the theory of change underlying the programme, the way it has been translated into action and the changes it has contributed to. What the TOR calls for falls short of that – by a long way.
For instance, a special TOR section on potential risks and limitations explicitly rules out quantitative data collection and counterfactuals. Instead, the prospective evaluators are invited to rely primarily on their own judgment, “backed by qualitative evidence” that would be drawn from statements and reports produced by the organisations running the programme. It is unclear what specific risks such restraint is supposed to address – not the risk of discovering the programme has passed unnoticed in the society it is supposed to strengthen, we hope?
We love qualitative research and we feel it is important to gather stakeholders’ views on the programmes they are implementing. And we do not say that every development project needs an impact assessment. In many cases, an exercise that focuses on process and immediate outcomes can be perfectly sufficient. (For instance, many smaller initiatives are so grossly understaffed or underfunded that one can’t expect them to produce any significant results anyway – in such a situation, a combination of in-depth conversations and an experienced evaluator’s own judgment may help to draw attention to necessary adjustments.)
But if you want to gauge the replicability of a multi-year multi-million-dollar initiative, then you’d better do it with the thoroughness and transparency it takes to produce robust findings. The firm that runs the programme convinces clients around the world to invest massive amounts of money into accountability. It should know how to run the kind of evaluation you need to find out whether a programme works. Are transparency and external scrutiny less important when it comes to one’s "own" programmes?
Who owns those programmes, anyway? But that opens a different discussion...