Cumulative Reasoning with Large Language Models

Abstract

Recent advancements in large language models (LLMs) have shown remarkable progress, yet their ability to solve complex problems remains limited. In this work, we introduce Cumulative Reasoning (CR), a structured framework that enhances LLM problem-solving by emulating human-like iterative and cumulative thought processes. CR orchestrates LLMs in three distinct roles—Proposer, Verifier(s), and Reporter—to systematically decompose tasks, generate and validate intermediate reasoning steps, and compose them into a solution by building a dynamic Directed Acyclic Graph (DAG) of verified propositions. This approach substantially enhances problem-solving capabilities. We demonstrate CR's advantage through several complex reasoning tasks, showing significant improvements in logical inference, arithmetic puzzles, and advanced mathematics.

Key Results

Our experiments show that CR consistently and significantly outperforms prior methods like Chain-of-Thought (CoT) and Tree-of-Thought (ToT) across a variety of challenging reasoning benchmarks.

Mathematical Reasoning (MATH): Achieves a 4.2% absolute improvement over previous methods and a 43% relative improvement on the most challenging Level 5 problems. When augmented with a code interpreter, CR outperforms Program-of-Thoughts by 38.8%.
Arithmetic Reasoning (Game of 24): Reaches 98% accuracy, a 24% absolute improvement over ToT, while exploring significantly fewer reasoning states.
Logical Inference (FOLIO-wiki): Attains up to 98.04% accuracy on a curated version of the dataset, a relative improvement of up to 9.3% over strong baselines.

Cumulative Reasoning

TLDR: We introduce Cumulative Reasoning (CR) that enhances LLMs' problem-solving abilities by orchestrating an iterative and compositional process involving different roles, demonstrating superior performance across a range of complex tasks.

CR introduces a novel framework leveraging three specialized types of Large Language Models (LLMs) in a collaborative reasoning process:

1. Proposer: Suggests potential steps based on the current context, initiating the reasoning cycle.

2. Verifier(s): Assess the proposer's suggestions for accuracy, incorporating valid steps into the ongoing context.

3. Reporter: Determines the appropriate moment to conclude the reasoning process, based on whether the accumulated context leads to a definitive solution.

Framework Overview

Overview of the Cumulative Reasoning (CR) process

Our approach is visualized in Figure 2, illustrating how CR iteratively constructs and refines a solution from initial propositions to a final conclusion. In practical terms, the proposer is ideally a model pre-trained on related derivation tasks, while verifiers translate these proposals into formal systems for validation, employing either symbolic reasoning systems or incorporating a code environment.

While specialized models offer optimal performance, the flexibility of CR permits effective deployment using general-purpose LLMs like GPT-4, tailored through role-specific prompting. Notice that in our method, we introduced several different LLMs with fresh eyes by managing the thinking context of each role, beyond the self-verification capabilities of language models.

The underlying rationale for CR draws from intuitionistic logic and the philosophy of mathematical constructivism—asserting that a cumulative, constructive approach is inherently suited for complex reasoning tasks. This methodology not only allows for the dynamic adjustment of the reasoning trajectory based on intermediate validations but also significantly enhances the problem-solving efficacy of LLMs.

Citation

Please cite the paper and star this repo if you use Cumulative Reasoning (CR) and find it interesting/useful, thanks!

@article{zhang2023cumulative,
  title={Cumulative Reasoning With Large Language Models},
  author={Zhang, Yifan and Yang, Jingqin and Yuan, Yang and Yao, Andrew Chi-Chih},
  journal={Transactions on Machine Learning Research; arXiv preprint arXiv:2308.04371},
  year={2023}
}

Cumulative Reasoning with Large Languge Models

Abstract

Key Results

Cumulative Reasoning

Framework Overview

Citation