RAAC LLM Analysis Code Package

View on GitHub

Author Information

Author: Dr Lixu Liu

Position: Research Fellow

Contact:

Affiliations:

Associated Research Paper

This code package is the official implementation of the research paper:

Title: "Leveraging Large Language Models to Classify and Inspect Defects in Reinforced Autoclaved Aerated Concrete (RAAC)"

Status: Under peer review

Authors: Liu, L. et al.

Year: 2025

Important Notice: This code package is currently under embargo and will be made publicly accessible after the associated research paper is published. The repository is maintained for reference purposes during the peer review process.

Overview

This repository contains the full Python codebase supporting our research paper on RAAC defect analysis using LLMs. The code implements a structured, LLM-based workflow to:

The scripts were developed using Claude 3 Opus API and tested on a dataset of UK-based documents.

Features

Project Structure

Core Files

File Description
README.md Project documentation and setup instructions

1. Definition Extraction

Script Description
01-Prompt Setting 1-a_RAAC_DefinitionExtraction.py Extracts RAAC definitions or generates contextual summaries
01-Prompt Setting 1-b_RAAC_DefinitionPresenceCheck_BinaryFlag.py Binary classification for RAAC definition presence

2. Explicit Mention Check

Script Description
02_Prompt Setting 2-a_RAAC_ExplicitMentionCheck.py Determines explicit RAAC mentions in documents
02_Prompt Setting 2-b_RAAC_ExplicitMentionCheck_BinaryFlag.py Binary classification for RAAC mentions

3. Defect Analysis

Script Description
03_RAAC_DefectExtraction_SevenQuestions.py Extracts defect information using 7-question framework
04_RAAC_CombineExtractedDefectData.py Aggregates defect data into master dataset

Installation

  1. Clone the repository:

    git clone https://github.com/lixuliu/raac-llm-analysis-code-package.git
    cd raac-llm-analysis-code-package
  2. Create a virtual environment (recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Set up your API key:

    export ANTHROPIC_API_KEY='your-api-key-here'  # On Windows: set ANTHROPIC_API_KEY=your-api-key-here

Usage

  1. Set up your environment:

    # Set your Anthropic API key
    export ANTHROPIC_API_KEY='your-api-key-here'  # On Windows: set ANTHROPIC_API_KEY=your-api-key-here
    
    # Install required dependencies
    pip install -r requirements.txt
  2. Prepare your data:

    • Create a folder containing your PDF documents to analyze
    • Ensure all PDFs are readable and not corrupted
  3. Run the analysis pipeline:

    # Step 1a: Extract RAAC definitions
    python "01-Prompt Setting 1-a_RAAC_DefinitionExtraction.py"
    
    # Step 1b: Check for RAAC definition presence
    python "01-Prompt Setting 1-b_RAAC_DefinitionPresenceCheck_BinaryFlag.py"
    
    # Step 2a: Check for explicit RAAC mentions
    python "02_Prompt Setting 2-a_RAAC_ExplicitMentionCheck.py"
    
    # Step 2b: Check for RAAC mention presence
    python "02_Prompt Setting 2-b_RAAC_ExplicitMentionCheck_BinaryFlag.py"
    
    # Step 3: Extract defect information
    python 03_RAAC_DefectExtraction_SevenQuestions.py
    
    # Step 4: Combine the results
    python 04_RAAC_CombineExtractedDefectData.py

Acknowledgements

This research was supported by:

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
You are free to share and adapt the material for non-commercial purposes, provided you give appropriate credit. Commercial use is not permitted without explicit permission from the author.