Singapore Management University
Singapore Management University (SMU) used Acquia Cloud to transform static PDFs into dynamic, reusable content with a scalable AI-powered solution.
The Client
Singapore Management University (SMU) is a higher education institution in Singapore that manages course data across various schools and stakeholders. The university's IT team oversees all public-facing websites at SMU and is responsible for developing the course management system.
The Situation
SMU needed to transition course data from the end-of-life SharePoint Portal — where all course PDFs were previously hosted — to a sustainable, scalable platform. The basic requirement was to replicate the file directory system in Drupal, but the IT team identified an opportunity to go beyond simply storing the PDFs. Recognizing the value of converting content-rich PDFs into reusable data, they sought to develop an AI-powered parsing method to automate this process.
While useful as standalone documents, the course PDFs posed challenges when reusing or sharing data across other school sites due to their non-standard formats. Additionally, since course content is updated twice a year with the start of each new term, the university needed an efficient, accurate way to manage these updates without repetitive manual tasks.
The Challenge
The project faced significant challenges in achieving reliable parsing of course PDFs into structured JSON data. Initially, the team fed the PDFs directly into the GPT model, but this approach yielded inconsistent results. The GPT model struggled to extract and interpret content effectively when parsing PDFs directly, likely due to the complexity of converting raw PDF data into usable text.
Another major speedbump was managing the processing of thousands of PDF files efficiently. Simultaneously sending all these requests to Azure AI services triggered throttling due to rate limits imposed by the service. The team needed a solution that could handle the volume of PDFs, allowing it to maintain a consistent workflow without exceeding service limits.
The Solution
SMU's IT division developed the Course Management System in-house, leveraging their expertise in Drupal, Azure services, and AI technologies. The IT team built the system on Drupal, powered by Acquia Cloud Platform.
To address the PDF parsing challenge, the team incorporated an open-source Node.js library called "PDFExtract" to pre-process the PDFs by converting them into plain text before passing the data to the GPT model. This not only improved the accuracy of the output but also reduced the computational resources required, resulting in faster response times.
For the technical implementation, the team used Azure to host some of their services, including the Function App service, which runs a Node.js endpoint to handle PDF parsing. The custom Drupal module sends PDFs to this service for processing. Once parsed into text, the content is passed to Azure AI services, where a GPT-based model is utilized. The team carefully crafted prompts to ensure the GPT model returns consistently structured JSON output. This JSON includes fields like school term, course code, course title, course description, and instructors. The module then uses this data to create course nodes automatically in Drupal.
To solve the processing volume challenge, SMU’s IT specialists built a custom queue system to process the initial batch of 1,000 PDFs in smaller, manageable increments. Designed to run on a cron schedule, the queue ensures requests are distributed over time to avoid exceeding service limits while maintaining a consistent workflow.
Acquia Cloud Platform played an important role in facilitating this innovation. Its robust, scalable hosting environment ensured the smooth handling of high data volumes during the migration and processing stages while providing consistent performance for multiple stakeholders accessing the data.
The Results
The Course Management System project delivered transformative results for SMU, significantly improving efficiency, accuracy, and scalability in managing course information:
Time savings: Previously, processing a single PDF manually — reading, extracting relevant information, and creating a node in Drupal — took an estimated 30 minutes per file. With over 1,000 PDFs, this would have required approximately 500 hours of manual effort. Through automation, this process now requires minimal human intervention, saving an estimated 90% of the time and allowing staff to focus on higher-value tasks.
Improved accuracy: Manual data entry often introduced inconsistencies or errors. By automating the process, the project has achieved a 100% consistency rate in formatting and data output, ensuring accuracy across all course nodes.
Cross-site efficiency: With the data now structured as reusable nodes, course information can be easily shared and displayed across multiple SMU websites. This eliminates redundant work for web admins, saving an additional 40 hours per term and ensuring all sites display consistent, up-to-date information.
Scalability: The team designed the system to handle the biannual updates efficiently, eliminating repetitive manual tasks and ensuring they can complete updates efficiently and accurately.
This project has fundamentally changed how SMU handles course information, enabling faster updates, greater collaboration, and a more efficient workflow. The results highlight SMU's leadership in digital experience innovation, leveraging cutting-edge technologies to deliver tangible business benefits.