RAAC LLM Analysis Code Package

View on GitHub

Author Information

Author: Dr Lixu Liu

Position: Research Fellow

Contact:

Affiliations:

Associated Research Paper

This code package is the official implementation of the research paper:

Title: "Artificial Intelligence-assisted Literature Screening with Empirical Validation in Reinforced Autoclaved Aerated Concrete Research"

Status: In Preparation

Authors: Liu, L. et al.

Year: 2025

Important Notice: This code package is currently under embargo and will be made publicly accessible after the associated research paper is published. The repository is maintained for reference purposes during the peer review process.

Overview

This repository contains the full Python codebase supporting our research on AI-assisted literature screening and empirical validation in RAAC research. The code implements a comprehensive, AI-powered workflow to:

The scripts were developed using Claude 3 Opus API and tested on a comprehensive dataset of UK-based RAAC research documents, providing automated literature screening capabilities for researchers.

Features

Project Structure

Note: Scripts 1-4 (Definition Extraction and Explicit Mention Check) are the core validated components supporting the main research paper. Scripts 5-6 (Defect Analysis) are experimental components developed as part of this research but not yet empirically validated - they are provided for the research community to build upon.

Core Files

File Description
README.md Project documentation and setup instructions

1-2. Definition Extraction (Validated Core Components)

Script Description
01-a_RAAC_DefinitionExtraction.py Extracts RAAC definitions or generates contextual summaries
01-b_RAAC_DefinitionPresenceCheck_BinaryFlag.py Binary classification for RAAC definition presence

3-4. Explicit Mention Check (Validated Core Components)

Script Description
02-a_RAAC_ExplicitMentionCheck.py Determines explicit RAAC mentions in documents
02-b_RAAC_ExplicitMentionCheck_BinaryFlag.py Binary classification for RAAC mentions

5-6. Defect Analysis (Experimental Components)

Script Description
03_DefectExtraction_SevenQuestions.py Extracts defect information using 7-question framework
04_CombineExtractedDefectData.py Aggregates defect data into master dataset

Installation

  1. Clone the repository:

    git clone https://github.com/lixuliu/raac-llm-analysis-code-package.git
    cd raac-llm-analysis-code-package
  2. Create a virtual environment (recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Set up your API key:

    export ANTHROPIC_API_KEY='your-api-key-here'  # On Windows: set ANTHROPIC_API_KEY=your-api-key-here

Usage

  1. Set up your environment:

    # Set your Anthropic API key
    export ANTHROPIC_API_KEY='your-api-key-here'  # On Windows: set ANTHROPIC_API_KEY=your-api-key-here
    
    # Install required dependencies
    pip install -r requirements.txt
  2. Prepare your data:

    • Create a folder containing your PDF documents to analyze
    • Ensure all PDFs are readable and not corrupted
  3. Run the analysis pipeline:

    # Step 1a: Extract RAAC definitions
    python "01-a_RAAC_DefinitionExtraction.py"
    
    # Step 1b: Check for RAAC definition presence
    python "01-b_RAAC_DefinitionPresenceCheck_BinaryFlag.py"
    
    # Step 2a: Check for explicit RAAC mentions
    python "02-a_RAAC_ExplicitMentionCheck.py"
    
    # Step 2b: Check for RAAC mention presence
    python "02-b_RAAC_ExplicitMentionCheck_BinaryFlag.py"
    
    # Step 3: Extract defect information
    python 03_DefectExtraction_SevenQuestions.py
    
    # Step 4: Combine the results
    python 04_CombineExtractedDefectData.py

Acknowledgements

This research was supported by:

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
You are free to share and adapt the material for non-commercial purposes, provided you give appropriate credit. Commercial use is not permitted without explicit permission from the author.