GenAI-Powered Training Assistant with Role-Based Access & Multimodal RAG

GenAI-Powered Training Assistant with Role-Based Access & Multimodal RAG

GenAI-Powered Training Assistant with Role-Based Access & Multimodal RAG

DEWEY: GenAI-Powered Training Assistant for Field Sales and Home Office Users

Deliver the Right Training Knowledge, to the Right Role, at the Right Time.

Overview

Static training repositories do not train people they store information. The gap between content existing and knowledge being accessible is exactly where field productivity is lost.

DEWEY is a GenAI-powered conversational AI assistant integrated directly into the PEP Assist (Promotional Education Program) website, built to serve both Field Salespersons and Home Office Users. Rather than manually searching through 100+ training documents and non-searchable video content, users interact with DEWEY in natural language receiving precise, role-specific answers from the full training repository, instantly.

Industry Challenges

Business ChallengesTechnical Challenges
  • Unmanageable Content Volume: Over 100 unstructured PDF training documents with no intelligent retrieval mechanism, making accurate knowledge access impractical at scale.
  • Non-Searchable Video Content: Video-based learning materials a significant portion of the training library were entirely unsearchable, leaving critical knowledge inaccessible.
  • Inefficient Knowledge Retrieval: Manual document search produced inaccurate, time-consuming results with no role-awareness, leading to low trust and poor adoption.
  • Access Governance: No mechanism existed to enforce role-based access to documents, creating compliance and governance risk across user groups.
  • No Native PDF Extraction Plugin: Dataiku had no built-in capability for extracting text, tables, and images from complex PDF structures requiring a purpose-built, reusable custom plugin.
  • Vector Database Synchronisation: Ensuring accurate, real-time embedding sync within the vector database as content was updated demanded careful architectural design.
  • RBAC via Metadata Filtering: Implementing Role-Based Access Control (RBAC) within the vector database using metadata filtering rather than at the application layer required a deliberate and non-standard implementation approach.

DEWEY was built with a clear vision: transform a static training repository into an intelligent, role-aware AI learning assistant that enables field sales and home office users to learn faster, access knowledge precisely, and improve field performance with confidence.

Multimodal Content Ingestion The ingestion pipeline was designed to process the full spectrum of training content complex PDF documents, structured data, and video-based learning materials. A custom PDF extraction plugin was built within Dataiku to extract text, tables, and images from documents of varying structure, creating a reusable component for future AI initiatives. Video content was indexed through a purpose-built pipeline, making previously unsearchable learning materials fully query able.

Role-Based Access Control via Metadata Filtering RBAC was implemented directly within ChromaDB using metadata filtering ensuring every retrieval query is scoped to the user's role before semantic search is applied. Field Salespersons and Home Office Users access only the content relevant to their role, eliminating cross-role content leakage and meeting enterprise compliance requirements.

Conversational Natural Language Interface Users interact with DEWEY through a natural language conversational interface integrated into the PEP Assist website asking questions and receiving precise, sourced answers without navigating documents manually.

Smart Vector Synchronisation An intelligent embedding sync mechanism ensures the vector database remains current as training content is updated maintaining retrieval accuracy without manual re-indexing.

Visual Content Analysis AWS Bedrock and Vision Large Language Models (VLLMs) were used to analyse image-based and visual content within training documents, ensuring no information was lost due to non-text formatting.

Technologies Used

Dataiku: Collaborative enterprise AI platform for pipeline development, custom plugin creation, and governed ML deployment at scale

Snowflake: Cloud-native data platform for secure, scalable data warehousing and training content management

AWS + AWS Bedrock: Cloud infrastructure and managed foundation model services powering Generative AI inference and VLLM-based visual content analysis

ChromaDB: Open-source vector database for semantic embeddings storage, metadata-filtered RBAC, and fast retrieval

Python: Core language for data engineering, AI pipeline development, and custom plugin architecture

Flask: Lightweight web framework for the conversational AI interface and API layer

Quantified Business Value

  • 300 field sales users
  • 20 minutes saved per user per day
  • 220 working days per year

22,000 hours saved annually

At an average sales hourly cost of $40/hr estimated productivity gain of ~$880,000 annually.

Conclusion

DEWEY demonstrates that Generative AI transforms training infrastructure from a passive content repository into an active, intelligent knowledge partner one that knows who is asking, what they are authorised to access, and how to surface the right answer instantly.

By combining multimodal RAG, metadata-driven RBAC, a custom reusable plugin architecture, and a seamless conversational interface, DEWEY delivered measurable productivity gains, stronger governance, and a scalable AI foundation that extends well beyond a single use case.

"A training library is only as valuable as how quickly the right person can find what they need. DEWEY made that instant and made sure only the right person could."

πby3