Cumulative Reasoning with Large Language Models

Abstract

Despite the recent advancements in language models (LMs), their ability to solve complex problems remains limited. This paper introduces Cumulative Reasoning (CR), a novel approach that utilizes LMs cumulatively and iteratively, mirroring human thought processes for problem-solving. CR decomposes tasks into smaller, manageable components and leverages previous propositions for effective composition, significantly enhancing problem-solving capabilities. We demonstrate CR's superiority through several complex reasoning tasks: it outperforms existing methods in logical inference tasks with up to a 9.3% improvement, achieving 98.04% accuracy on the curated FOLIO wiki dataset. In the Game of 24, it achieves 98% accuracy, marking a 24% improvement over the prior state-of-the-art. Additionally, CR sets new state-of-the-art on the MATH dataset, achieving a 4.2% increase from previous methods and a 43% relative improvement in the most challenging problems. By extending CR to incorporate a code environment without external aids like retrieval or web browsing, we further harness the computational and logical reasoning capabilities of LLMs, achieving a remarkable 72.2% accuracy on the MATH dataset and outperforming the PAL method by 38.8%. Our work not only sets new state-of-the-art but also paves the way toward more sophisticated AI reasoning methods.

Cumulative Reasoning

TLDR: We introduce Cumulative Reasoning (CR) that enhances LLMs' problem-solving abilities by orchestrating an iterative and compositional process involving different roles, demonstrating superior performance across a range of complex tasks.

CR introduces a novel framework leveraging three specialized types of Large Language Models (LLMs) in a collaborative reasoning process:

1. Proposer: Suggests potential steps based on the current context, initiating the reasoning cycle.

2. Verifier(s): Assess the proposer's suggestions for accuracy, incorporating valid steps into the ongoing context.

3. Reporter: Determines the appropriate moment to conclude the reasoning process, based on whether the accumulated context leads to a definitive solution.

Framework Overview

Our approach is visualized in Figure 2, illustrating how CR iteratively constructs and refines a solution from initial propositions to a final conclusion. In practical terms, the proposer is ideally a model pre-trained on related derivation tasks, while verifiers translate these proposals into formal systems for validation, employing either symbolic reasoning systems or incorporating a code environment.

While specialized models offer optimal performance, the flexibility of CR permits effective deployment using general-purpose LLMs like GPT-4, tailored through role-specific prompting. Notice that in our method, we introduced several different LLMs with fresh eyes by managing the thinking context of each role, beyond the self-verification capabilities of language models.

The underlying rationale for CR draws from intuitionistic logic and the philosophy of mathematical constructivism—asserting that a cumulative, constructive approach is inherently suited for complex reasoning tasks. This methodology not only allows for the dynamic adjustment of the reasoning trajectory based on intermediate validations but also significantly enhances the problem-solving efficacy of LLMs.

Citation

Please cite the paper and star this repo if you use Cumulative Reasoning (CR) and find it interesting/useful, thanks!

@inproceedings{
  zhang2024cumulative,
  title={Cumulative Reasoning with Large Language Models},
  author={Zhang, Yifan and Yang, Jingqin and Yuan, Yang and Yao, Andrew Chi-Chih},
  booktitle={ICLR 2024 Workshop on Bridging the Gap Between Practice and Theory in Deep Learning},
  year={2024},
  url={https://openreview.net/forum?id=XAAYyRxTlQ}
}

Cumulative Reasoning with Large Languge Models

Abstract