Showing posts sorted by relevance for query QCA. Sort by date Show all posts
Showing posts sorted by relevance for query QCA. Sort by date Show all posts

Saturday, September 26, 2020

EvalC3 versus QCA - compared via a re-analysis of one data set


I was recently asked whether that EvalC3 could be used to do a synthesis study, analysing the results from multiple evaluations.  My immediate response was yes, in principle.  But it probably needs more thought.

I then recalled that I had seen somewhere an Oxfam synthesis study of multiple evaluation results that used QCA.  This is the reference, in case you want to read, which I suggest you do.

Shephard, D., Ellersiek, A., Meuer, J., & Rupietta, C. (2018). Influencing Policy and Civic Space: A meta-review of Oxfam’s Policy Influence, Citizen Voice and Good Governance Effectiveness Reviews | Oxfam Policy & Practice. Oxfam. https://policy-practice.oxfam.org.uk/publications/*

Like other good examples of QCA analyses in practice, this paper includes the original data set in an appendix, in the form of a truth table.  This means it is possible for other people like me to reanalyse this data using other methods that might be of interest, including EvalC3.  So, this is what I did.

The Oxfam dataset includes five conditions a.k.a. attributes of the programs that were evaluated.  Along with two outcomes each pursued by some of the programs.  In total there was data on the attributes and outcomes of twenty-two programs concerned with expanding civic space and fifteen programs concerned with policy influence.  These were subject to two different QCA analyses.

The analysis of civic space outcomes

In the Oxfam analysis of the fifteen programs concerned with expanding civic space, QCA analysis found four “solutions” a.k.a. combinations of conditions which were associated with the outcome of successful civic space.  Each of these combinations of conditions was found to be sufficient for the outcome to occur.  Together they accounted for the outcomes found in 93% or fourteen of the fifteen cases.  But there was overlap in the cases covered by each of these solutions, leaving the question open as to which solution best fitted/explained those cases.  Six of the fourteen cases had two or more solutions that fitted them.

In contrast, the EvalC3 analysis found two predictive models (=solutions) which are associated with the outcome of expanded civic space.  Each of these combinations of conditions was found to be sufficient for the outcome to occur.  Together they accounted for all fifteen cases where the outcome occurred.  In addition, there was no overlap in the cases covered by each of these models.

The analysis of policy influencing outcomes

in the Oxfam analysis of the twenty-two programs concerned with policy influencing the QCA analysis found two solutions associated with the outcome of expanding civic space.  Each of these was sufficient for the outcome, and together they accounted for all the outcomes.  But there was some overlap in coverage, one of the six cases were covered by both solutions.

In contrast, the EvalC3 analysis found one predictive model which was necessary and sufficient for the outcome, and which accounted for all the outcomes achieved.

Conclusions?

Based on parsimony alone, the EvalC3 solutions/predictive models would be preferable.  But parsimony is not the only appropriate criteria to evaluate a model.  Arguably a more important criterion is the extent which a model fits the details of the cases covered when those cases are closely examined.  So, really what the EvalC3 analysis has done is to generate some extra models that need close attention, in addition to those already generated by the QCA analysis.  The number of cases covered by multiple models is been increased.

In the Oxfam study, there was no follow-on attention given to resolving what was happening in the cases that were identified by more than one solution/predictive model.  In my experience of reading other QCA analyses, this lack of follow-up is not uncommon.

However, in the Oxfam study for each of the solutions found at least one detailed description was given of an example case that that solution covered.  In principle, this is good practice. But unfortunately, as far as I can see, it was not clear whether that particular case was exclusively covered by that solution, or part of a shared solution.  Even amongst those cases which were exclusively covered by a solution there are still choices that need to be made (and explained) about how to select particular cases as exemplars and/or for a detailed examination of any causal mechanisms at work.  

QCA software does not provide any help with this task.  However, I did find some guidance in specialist text on QCA:  Schneider, C. Q., & Wagemann, C. (2012). Set-Theoretic Methods for the Social Sciences: A Guide to Qualitative Comparative Analysis. Cambridge University Press. https://doi.org/10.1017/CBO9781139004244 (it’s a heavy read in part of this book but overall, it is very informative).  In section 11.4 titled Set-Theoretic Methods and Case Selection, the authors note ‘Much emphasis is put on the importance of intimate case knowledge for a successful QCA.  As a matter of fact, the idea of QCA is a research approach and of going back-and-forth between ideas and evidence largely consists of combining comparative within case studies and QCA is a technique.  So far, the literature has focused mainly on how to choose cases prior to and during but not after a QCA were by QCA we here refer to the analytic moment of analysing a truth table it is therefore puzzling that little systematic and specific guidance has so far been provided on which cases to select for within case studies based on the results of i.e. after a QCA… ‘The authors then go on to provide some guidance (a total of 7 pages of 320).

In contrast to QCA software, EvalC3 has a number of built-in tools and some associated guidance on the EvalC3 website, on how to think about case selection as a step between cross-case analysis and subsequent within-case analysis.  One of the steps in the seven-stage EvalC3 workflow (Compare Models) is the generation of a table that compares the case coverage of multiple selected alternative models found by one’s analysis to that point.  This enables the identification of cases which are covered by two or more models.  These types of cases would clearly warrant subsequent within-case investigation.

Another step in the EvalC3 workflow called Compare Cases, provides another means of identifying specific cases for follow-up within-case investigations.  In this worksheet individual cases can be identified as modal or extreme examples within various categories that may be of interest e.g. True Positives, False Positives, et cetera.  It is also possible to identify for a chosen case what other case is most similar and most different to that case, when all its attributes available in the dataset are considered.  These measurement capacities are backed up by technical advice on the EvalC3 website on the particular types of questions that can be asked in relation to different types of cases selected on the basis of their similarities and differences. Your comments on these suggested strategies would be very welcome.

I should explain...

...why the models found by EvalC3 were different from those found from the QCA analysis.  QCA software finds solutions i.e. predictive models by reducing all the configurations found in a truth table down to the smallest possible set, using what is known as a minimisation algorithm called the Quine McCluskey algorithm.

In contrast, EvalC3 provides users with a choice of four different search algorithms combined with multiple alternative performance measures that can be used to automatically assess the results generated by those search algorithms. All algorithms have their strengths and weaknesses, in terms of the kinds of results they can find and cannot find, including the QCA Quine McCluskey algorithm and the simple machine learning algorithms built into EvalC3. I think the McCluskey algorithm has particular problems with datasets which have limited diversity, in other words, where cases only represent a small proportion of all the possible combinations of the conditions documented in the dataset. Whereas the simple search algorithms built into EvalC3 don't experience that is a difficulty. This is my conjecture, not yet rigorously tested.

[In the above data set the cases represented the two data sets analysed represented 47% and 68% of all the possible configurations given the presence of five different conditions]

While EvalC3 results described above did differ from the QCA analyses, they were not in outright contradiction. The same has been my experience when I reanalysed other QCA datasets. EvalC3 will either simply duplicate the QCA findings or produce variations on those, and often those which are better performing. 


Tuesday, October 07, 2014

Comparing QCA and Decision Tree models - an ongoing discussion



This blog is a continuation of a dialogue that is based on Michaela Raab and Wolfgang Stuppert's  EVAW blog. I would have preferred to post my response below via their blog's Comment facility, but it cant cope with long responses or hypertext links. They in turn have had difficulty posting comments on my YouTube site where this EES presentation (Triangulating the results of Qualitative Comparative Analyses (EES Dublin 2014)  can be seen. It was this presentation that prompted their response here on their blog.

Hi Michaela and Wolfgang

Thanks for going to the trouble of responding in detail to my EES presentation.

Before responding in detail I should point out to readers that the EES presentation was on the subject of triangulation, and how to compare QCA and Decision Tree models, when applied to the same data set. In my own view I think it is unlikely that either of these methods will produce the “best” results in all circumstances. The interesting challenge is to develop ways of thinking about how to compare and choose between specific models generated by these, and what may be other comparable methods of analysis. The penultimate slide (#17)  in the presentation highlights the options I think we can try out when faced with different kinds of differences between models.

The rest of this post responds to particular points that have been made by Michaela and Wolfgang, and then makes a more general conclusion.

Re  “1. The  decision tree analysis is not based on the same data set as our QCA” This is correct. I was in a bit of a quandary because while the original data set was fuzzy set (i.e. there intermediate values between 0 and 1) the solutions that were found were described in binary form i.e. the conditions and outcomes either were or were not present. I did produce a Decision Tree with the fuzzy set data but I had no easy means of comparing the results with the binary results of the QCA model. That said, Michaela and Wolfgang are right in expecting that such a model would be more complex and have more configurations.

Re “2. Decision tree analysis is compared with a type of QCA solution that is not meant to maximise parsimony.”  I agree that “If the purpose was to compare the parsimony of QCA results with those of decision trees, then the 'parsimonious' QCA solution should be used” But the intermediate solution was the solution that was available to me, and parsimony was not the only criteria of interest in my presentation. Accuracy (or consistency in QCA terms) was also of interest. But it was the difference in parsimony that stood out the most in this particular model comparison.

Re “3. The decision tree analysis performs less well than stated in the presentation” Here I think I disagree. The focus of the presentation is on consistency of those configurations that predict effective evaluations only (indicated in the tree diagram by squares with 0.0 value rather than 1.0 value ), not the whole model.  Among the three configurations that predict effective evaluations the consistency was 82%. Slide 15 may have confused the discussion because the figures there refer to coverage rather than consistency (I should have made this clear).

Re “none of the paths in our QCA is redundant”. The basis for my claim here was some simple color coding of each case according to which QCA configuration applied to them. Looking back at the Excel file it appears to me that cases 14 and 16 were covered by two configurations and cases 16 and 32 by another two configurations. BUT bear in mind this was done with the binary (crisp) data, not the fuzzy valued data. (The two configurations that did not seem to cover unique cases were  quanqca*sensit*parti_2  and qualqca*quanqca*sensit*compevi_3). The important point here is not that redundancy is “bad” but where it is found it can prompt us to think about how to investigate such cases if and when they arise (including when two different models provide alternate configurations for the same cases).

4. “The decision tree consistency measure is less rigorous than in QCA”       I am not sure that this matters in the case of the comparison at hand but it may matter when other comparisons are made. I say this because on the measures given on slide 13 the QCA model actually seems to perform better than the Decision Tree model. BUT again, a possibly confounding factor is the use of crisp versus fuzzy values behind the two measures. There is nevertheless a positive message here though, which is to look carefully into how the consistency measures are calculated for any two models being compared. On a wider note, there is an extensive array of performance measures for Decision Tree (aka classification) models that can be summarised in a structure known as a Confusion Matrix. Here is a good summary of these: http://www.saedsayad.com/model_evaluation_c.htm

Moving on, I am pleased that Michaela and Wolfgang have taken this extra step: “Intrigued by the idea of 'triangulating' QCA results with decision tree analysis, we have converted our QCA dataset into a binary format (as Rick did, see point 1 above) and conducted a csQCA with that data”. Their results show that the QCA model does better in three of four comparisons (twice on consistency levels and once on number of configurations). However, we differ in how to measure the performance of the Decision Tree model. Their count of configurations seem to involve double counting (4+4 for both types of outcome), whereas I count 3 and 2, reflecting a total of the 5 that exist in the tree. On this basis I see the Decision Tree model doing better on parsimony for both types of outcome but the QCA model doing better on consistency for both types of outcomes.

What would be really interesting to explore,  now that we have two more comparable models, is how much overlap there was in the contents of the configurations found by the two analyses, and the actual contents of those configurations i.e. the specific conditions involved. That is what will probably be of most interest to the donor (DFID) who funded the EVAW work. The findings could have operational consequences.

In addition to exploring the concrete differences between models based on the same data I think one other area that will be interesting to explore is how often the best levels of parsimony and accuracy can be found in one model versus one being available at the cost of the other in any given model. I suspect QCA may privilege consistency whereas Decision Tree algorithms might not do so. But this may simply reflect variations in analysis settings given for a particular analysis. This question has some wider relevance, since some parties might want to prioritise accuracy whereas others might want to prioritise parsimony. For example, a stock market investor could do well with a model that has 55% accuracy, whereas a surgeon might need 98%. Others might want to optimise both.

And a final word of thanks is appropriate, to Michaela and Wolfgang for making their data set publicly available for others to analyse. This is all too rare an event, but hopefully one that will become more common in the future, encouraged by donors and modeled by examples such as theirs.


Saturday, April 18, 2015

A mistaken criticism of the value of binary data



When reviewing a recent evaluation report I came across the following comment:
"Crisp set QCA where binary codings are used to establish the presence or absence of certain conditions does not facilitate a nuanced or granular analysis."
Wrong. Simply wrong.

A DFID strategy for promoting "improved governance" could be coded  as present or absent. This does seem crude, given the varieties of ways in which a governance strategy could actually be implemented. But the answer is not to ditch binary coding, but to extend it.

This can be done by breaking down the concept of "a strategy for improved governance"  into a number of component parts or attributes, and then coding for their presence/absence. The initial conception of the governance strategy is then deemed present if all 10 attributes are present. But it only takes a single change in one attribute at a time to produce 9 new versions of almost the same strategy. If you change two attributes at a time, there are ( 1 think... 1-(10 x 10) =) 100 new versions. If any number of attributes can be changed then this means there are 2 to the power of 10 possible configurations of the strategy, some of which may be very different from the present strategy. Basically it does not take much tweaking of the initial configuration before you will have nuances by the bucketful!

The limitations of the dis-aggregation-into-components approach have nothing to do with the nature of binary coding, but rather whether there are enough cases available to allow identification of the kinds of outcomes associated with the different varieties of configurations arising from the more micro-level coding of attributes.

If there are enough cases available, then learning about what works through the emergence (or planned development) of variations in the initial configuration then becomes possible. Some of these new versions of a governance strategy may work more effectively than the initial model, and others less so. Incremental exploration becomes possible.

For more on the idea of exploring adjacent variations in causal configurations see Andreas Wagner's very interesting (2014) book titled "The Arrival of the Fittest" which explores a theory of how innovation is possible in biological systems. Here is a review of the book, in the Times Higher Education website.

There is also a connection here, I think, with Stuart Kauffman's concept of "the adjacent possible", an idea also taken up by Stephen Johnson in his book "Where do good ideas come from: The natural history of innovation" Here is a review of the book in the Guardian

Postscript 2015 05 14: I heard the same"binary is crude"  criticism again today from a person attending a QCA presentation at the UK Evaluation Society Conference in London.

This time I will present another response. Binary judgments can be and often are derived from a dichotomised scale that captures graduations of the phenomena of interest. As Carroll Patterson pointed out today, with current QCA software it is now possible to experiment with varying the location of the cut-off point on such scales, and observe the consequences for the quality of the configurations that are then identified as the best fitting solutions The same approach is also possible with searches for best-fitting configurations using an evolutionary algorithm, which is another approach I have been experimenting with recently. It is also possible to go much further into the specific details of the underlying concept being measured by a scale by basing it on the aggregated output of a weighted checklist, like the kind I have described elsewhere. Basically, the limit to what is possible is defined by the imagination of the researcher/evaluator, not any inherent limitation of binary measures.

Postscript  2015 05 17: I tried to post a Comment below in reply to Anon's comment below, but Blogger.com wont accept any HTML formatting, so I will place the comment here instead.

RE "If, to combat the reductiveness of binary coding, you introduce a scale of 4-6 points, you still face the same problem in coding something more complex – a remote non expert is reducing a complex context and process to a number in an arbitrary way. "
Coding for QCA (and other purposes, such as when using NVIVO) should always be done in a way that is transparent and replicable, with attention to inter-rate reliability. It should certainly not be done in an “arbitrary way”
RE "Grading a large, diverse and complicated country on a scale of 0-1 or 1-5 on 'improved governance' is just ridiculous. Anyone who has studied the way people actually behave, governance, how decisions are really made or projects succeed or fail, will tell you that this reductiveness does not helpfully or accurately reflect reality."
QCA has been used in a field of Political Science since the 1980s and many of these applications have been cross-country analyses of political systems.
RE QCA is not qualitative – as it seeks to reduce a complex qualitative issue to a quantitative score - a number.
In crisp-set QCA data set the “number” 0 or 1 is actually a category not a numerical value. QCA could be done just as well by replacing  the 0’s and 1’s with the words “absent” and “present” 
RE "QCA is not comparative – the serious comparative part comes afterwards in some form of qualitative analysis, which researchers can choose. Looking at the truth table for patterns is the only form of comparison that QCA offers."
There are two levels of analysis involved in QCA: within-case analysis and between-case analysis.  At the beginning within-case analysis informs the selection of conditions to be included in a data set. When inconsistencies are found in an examination of configurations in a data set good practice advises a return to within-case analysis to identify missing conditions that can resolve these inconsistencies.  When these have been resolved and set of configurations has been identified that accounts for all case in the most parsimonious way possible,  these then need to be interpreted by reference to   the details of specific cases, with particular attention to more detailed process that connect the conditions making up the configurations.
RE "In my view QCA is a quantitative form of data management and pattern identification."
It does depend on what you mean by quantitative. It is based on a form of mathematics known as set theory, but that is about logical relationships, not quantities. In case there is any reservation about its significance, pattern identification is very important. In a data set with 10 different conditions there are 2 to the 10 different possible combinations of these that might be consistently associated with an outcome of interest. Finding these is like looking for a needle in a haystack. QCA and other methods like decision tree algorithms, help us find what part of the haystack the needle is most likely to be found. But as I said at the end of my section of the UKES presentation, finding a plausible configuration is not enough. It is necessary but not sufficient  for a strong causal claim. There also needs to be a plausible account of the likely causal mechanisms at work that connect the conditions in the configurations. These will only be found and confirmed through detailed within-case investigations, using methods like (but not only) process tracing. And the pattern finding has to be systematic, and transparent in the way it has been done. This is the case with QCA and Decision Tree modeling, where there are specifics algorithms used, both with their known limitations

There is a useful reference that may be of interest: Wagemann, C., Schneider, C.Q., 2007. Standards of Good Practice in Qualitative Comparative Analysis (qca) and Fuzzy-Sets. http://www.compasss.org/wpseries/WagemannSchneider2007.pdf









Friday, March 28, 2014

The challenges of using QCA



This blog posting is a response to my reading of the Inception Report written by the team who are undertaking a review of evaluations of interventions relating to violence against women and girls. The process of the review is well documented in a dedicated blog – EVAW Review

The Inception Report is well worth reading, which is not something I say about many evaluation reports! One reason is to benefit from the amount of careful attention the authors have given to the nuts and bolts of the process. Another is to see the kind of intensive questioning the process has been subjected to by the external quality assurance agents and the considered responses by the evaluation team. I found that many of the questions that came to my mind while reading the main text of the report were dealt with when I read the annex containing the issues raised by SEQUAS and the team’s responses to them.

I will focus on one issue that is challenge for both QCA and data mining methods like Decision Trees (which I have discussed elsewhere on this blog). That is the ratio of conditions to cases. In QCA conditions are attributes of the cases under examination that are provisionally considered as possible parts of causal configurations that explain at least some of the outcomes. After an exhaustive search and selection process the team has ended up with a set of 39 evaluations they will use as cases in a QCA analysis. After a close reading of these and other sources they have come up with a list of 20 conditions that might contribute to 5 different outcomes. With 20 different conditions there are 220 (i.e. 1,048,576) different possible configurations that could explain some or all of the outcomes. But there are only 39 evaluations, which at best will represent only 0.004% of the possible configurations. In QCA the remaining 1,048,537 are known as “logical remainders”. Some of these can usually be used in a QCA analysis through a process using explicit assumptions e.g. about particular configurations plus outcomes which by definition would be impossible to occur in real life. However, from what I understand of QCA practice, logical remainders would not usually exceed 50% of all possible configurations.

The review team has dealt with this problem by summarising the 20 conditions and 5 outcomes into 5 conditions and one outcome. This means there are 25 (i.e. 32) possible causal configurations, which is more reasonable considering there are 39 cases available to analyse. However there is a price to be paid for this solution, which is the increased level of abstraction/generality in the terms used to describe the conditions. This makes the task of coding the known cases more challenging and it will make the task of interpreting the results and then generalising from them more challenging as well. You can see the two versions of their model in the diagram below, taken from their report.
 
What fascinated me was the role of evaluation method in this model (see “Convincing methodology”). It is only one of five conditions that could explain some or all of the outcomes. It is quite possible therefore that all or some of the case outcomes could be explained without the use of this condition. This is quite radical, considering the centrality of evaluation methodology in much of the literature on evaluations. It may also be worrying to DFID in that one of their expectations of this review was it would “generate a robust understanding of the strengths, weaknesses and appropriateness of evaluation approaches and methods”. The other potential problem is that even if methodology is shown to be an important condition, its singular description does not provide any means to discriminating between forms which are more or less helpful.

The team seems to have responded to this problem by proposing additional QCA analyses, where there will be an additional condition that differentiates cases according to whether they used qualitative or quantitative methods.  However reviewers have still questioned whether this is sufficient. The team in return have commented that they will “add to the model further conditions that represent methodological choice after we have fully assessed the range of methodologies present in the set, to be able to differentiate between common methodological choices” It will be interesting to see how they go about doing this, while avoiding the problem of “insufficient diversity” of cases already mentioned above.

One possible way forward has been illustrated in a recent CIFOR Working Paper (Sehring et al, 2013) and which is also covered in Schneider and Wagemann (2012). They have illustrated how it is possible to do a “two-step QCA”, which differentiates between remote and proximate conditions. In the VAWG review this could take the form of an analysis of conditions other than methodology first, then a second analysis focusing on a number of methodology conditions. This process essentially reduces a larger number of remote conditions down to a smaller number of configurations that do make a difference to outcomes, which are then included in the second level of the analysis which uses the more proximate conditions. It has the effect of reducing the number of logical remainders. It will be interesting to see if this is the direction that the VAWG review team are heading.

PS 2014 03 30: I have found some further references to two-level QCA:
 And for people wanting a good introduction to QCA, see