This project is not covered by Drupal’s security advisory policy.

This module allows extracting content from Word and RTF documents for use with Document Loader, using the phpoffice/phpword PHP library.

Supported Input Formats:

  • Word 2007+ (.docx)
  • Word 2003 (.doc)
  • OpenDocument Text (.odt)
  • Rich Text Format (.rtf)

Supported Output Formats:

  • text
  • html
  • markdown

Note on RTF: RTF support is best-effort as PHPWord's RTF reader has limitations. It does not preserve headings or lists, and may drop special characters like smart quotes, accented letters, and dashes.

Requirements

This module requires the following modules:

Installation

composer require drupal/document_loader_phpword

Configuration

  1. Enable the module at Administration > Extend
  2. See PHPWord as an available plugin in the Document Loader configuration at admin/config/media/document-loader

Similar Projects

  • AI File To Text: Leverages the AI module to improve the output of loaded documents
Supporting organizations: 
Development

Project information

Releases