DAFCC – H2020, HORIZON EUROPE

Project code: PN-III-P3-3.5-EUK-2019-0234

Project title: Data Analysis and Profiling Framework for CALL CENTERS

Acronym: DAFCC

Every call center faces periods of intense demand, during which agents need to manage issues effectively to reduce average call handling time, minimize the number of calls on hold and limit the abandoned call rate. In these situations, the pressure is significant, and prolonged on-hold time contributes to dissatisfaction and amplifies customer dissatisfaction. Call centers typically accumulate a significant amount of data, including emails, call logs, various types of messages and even audio recordings. Analyzing this data using traditional statistical methods is a difficult task. Analyzing audio recordings is even more complicated because it requires manual transcription, a time-consuming process. Recent advances in natural language processing (NLP) allow the use of machine learning to analyze uncollected datasets using text-mining techniques and speech-to-text transcription algorithms.

The aim of this project is to introduce a deep learning natural language processing layer between the customer and call center staff. Based on analyzing several types of data, including customer emails and messages, call statistics (waiting time, call duration) and real-time transcripts of the call itself, this layer will help the agent to make strategic decisions and prioritize calls.

Project Stages and timeline

Stage 1: Definition of use cases, system requirements and architecture (01/09/2021 – 31/12/2021)

Activity 1.1 Use cases and test plan definition

In this activity two use cases are described:

(a) The chatbot module intended for local administrative centers to manage building permit information.

b) The voicebot module for user interaction with a monitoring application to obtain or communicate specific information related to pollution, temperature or water levels, for example.

In order to define the chatbot module, an sample corpus is presented to train the chatbot to be exploited. The corpus is built in accordance with the Romanian legislation in this field and is structured by labeled sets of questions, each set with a single answer, covering a wide range of issues.

The voicebot module is also illustrated through a simple example of user interaction with a monitoring application to obtain or communicate specific information. Finally, issues related to the testing process are presented, and a test plan is outlined.

Activity 1.2 Defining functional and non-functional requirements

Activity 1.2 highlights the role of system components and integration requirements. The speech-to-text module has the most important role in the phone interaction between customers and the call center. All other modules are dependent on its output. Thus transcription must be done in real time and as accurately as possible.

Activity 1.3 Defining the general architecture

Activity 1.3 presents the architecture of the DAFCC system. The system is composed of several natural language processing modules in its two aspects, text and audio. The main modules of the system have been presented in detail: speech-to-text speech recognition module, chatbot, topic extraction module, sentiment analysis module.

Results

The system described in the first stage includes two use cases: a Chatbot for managing local administrative information for building permits and a VoiceBot for interacting with an environmental monitoring application. The testing ensured correct functioning of the modules, including questions semantic analysis and association with corresponding answers. The system consists of natural language processing modules for conversation, semantic analysis, speech recognition and sentiment analysis.

Stage 2: Definition of models and development of the speech recognition module (01/01/2022 – 31/12/2022)

Activity 2.1 Collection and validation of the Romanian audio corpus

This activity described how to collect the two speech corpora. One was created to simulate conversations in a bank call center. The second one was collected from a real Romanian call center. The recordings from the second collection require conversion to PCM format. Corpus validation issues were presented.

Activity 2.2 Experimenting with different DNN (Deep Neural Network) methodologies and choosing the optimal solution

In this activity solutions based on multi-layer neural networks are presented in the domains of:

a) Speech recognition

b) Natural Language Processing

Several solutions for implementing speech recognition have been analyzed: the Speech-to-text solution from Google, solutions based on the Whisper system (open-source software from OpenAI, Released in 2022), Kaldi, Notta.

NLP module implementation solutions, including Python-based ones, were also analyzed.

Activity 2.3 Developing the speech recognition solution

In this activity the building blocks of a speech recognition system are presented: acoustic model, phonetic dictionary, language model. The acoustic model is the main element, developed with the help of machine learning technologies. Methods for training and validation of acoustic models and their integration into the speech recognition solution were discussed.

Activity 2.4 Experimenting with a multilingual concept

In this section we presented the principles of multilingual acoustic model validation tools, we presented one such tool, eTranslation, and how it can be integrated with a chatbot (in our case BotStudio) to retrieve questions in different languages, and generate answers in that language. We also presented examples of Romanian corpora of scripts that can be used to develop specific applications.

Results

In Stage 2, audio corpora were collected and validated, DNN (Deep Neural Network) methods for speech recognition were tested, a speech recognition solution with acoustic models was developed, and finally the principles of multilingual acoustic model validation tools, one such tool, eTranslation, and how it can be integrated with a Chatbot were presented.

Stage 3: Definition of models and development of the Text Mining module; Integration of system components (01/01/2023 – 31/12/2023)

Activity 3.1 Definition of the technological architecture and technical specifications for Text Mining models

This activity describes in detail the text mining concept as well as the elements involved in the implementation of this concept: speech-to-text speech recognition module, topic extraction module, keyword extraction module, named entity identification module, sentiment analysis module, chatbot. To this end, the concept of natural language processing is presented as a fundamental domain providing the link between human communication and computerized systems. The sequence of text mining processing and use cases were described. Finally the software tools that are used to develop these modules, mainly based on the Python programming language and its libraries, are described.

Activity 3.2 Construction of prediction models and development of the Text Mining module

In this activity several text processes using natural language processing techniques are detailed. The first section presents several variants of keyword extraction, in Romanian and English, using various Python libraries, with examples (e.g. on minutes of meetings). In the next section, methods for identifying named entities are described, with presentation of the libraries used, and of the language models, for English and Romanian. Results obtained on some concrete cases are presented. In section three, techniques to identify the type of message (question, suggestion) are presented. In section four, different ways to implement sentiment analysis are presented and some examples are discussed. The scientific report contains the code used to implement these modules, the results obtained and their performance or accuracy.

Activity 3.3 Design and development of graphical user interfaces

This interface has been designed to simplify text manipulation and analysis, providing access to essential analysis functionality. The elements of the interface (language selection, file loading, choice of functionality, etc.) and the technology used in the implementation are described.

Activity 3.4 Design and development of interfaces and connectors for communication between components

This activity presented the interfaces and connectors realized for communication between components. By following the OpenAPI public standard for implementing and describing connectors, the easy integration of text analytics functionality with other services or platforms is possible.

Results

At this stage significant results have been achieved in the development of the text mining module. Essential functions from the field of text analysis were introduced and implemented: keyword identification, named entity recognition, text summarization, message type determination and sentiment extraction. During this stage, an intuitive graphical user interface was designed in order to facilitate user interaction with the developed modules. The connections between modules and interfaces were also established for efficient communication between system components.

Stage 4: Prototype testing and validation; Dissemination and market (01/01/2024 – 30/08/2024)

Activity 4.1 Technical evaluation of the test environment

The main objective of this activity was to define and evaluate the test environment. Starting from the data processing flow diagram of a call center, the system requirements for the test environment are defined: hardware requirements (type, number, capacity of devices and servers); software requirements (operating system, applications, libraries, drivers); network requirements.

Activity 4.2 Integrated platform testing and validation

This activity describes the main steps and objectives of the testing: validating the functionality, performance and usability of the software; assessing the readiness and stability of the software for release.

Activity 4.3 Configuring the system for going into production

This activity aimed at providing the hardware and software elements necessary for the proper functioning of the platform, their assembly and testing under real conditions.

Activity 4.4 Market analysis and market exploitation strategy definition

This activity describes the market analysis of the call center data analytics solutions market and defines the exploitation strategy of this market. Growth opportunities and main competitors have been identified, and the commercialization strategies of the project have been outlined. The solutions to be offered for commercialization have been defined and a corresponding price offer has been created.

Activity 4.5 Dissemination of project results

This activity describes the dissemination of project results through the publication of articles, the creation and updating of a dedicated web page on the company’s website with information related to the project status, and participation in industry events and conferences. These actions aimed to communicate and promote the project results to a wider audience and potential customers.

Results

In this Stage, the integration of the main components of the platform, i.e. the speech-to-text transcription module and the NLP text processing module, was realized in order to process the data in the form of audio and text and to deliver information related to the content of the processed data. Thus, for example, with the help of information obtained about keywords in customer conversations, recognition of named entities, sentiment analysis, it is possible to establish the customer profile, i.e. the customer’s preferred products and the customer’s attitude towards them. Also, at this stage, an analysis of the market of solutions in this area was conducted and a strategy for its exploitation was developed. In the last activity of this phase, the results obtained were disseminated through publications, by publishing key information about the project on the project website and by discussions with potential beneficiaries at different events, fairs and conferences.

Published articles

Gavat, Inge & GRIPARIS, ANDREEA & Segarceanu, Svetlana. (2023). Natural language processing in assistive technologies. Romanian Journal of Technical Sciences – Applied Mechanics. 68. 129-140. 10.59277/RJTS-AM.2023.2-3.02.

Also, an abstract of the potential paper entitled “NLP-Based Solutions for Call Center Optimization” has been prepared for presentation at the International Symposium for Design and Technology in Electronic Packaging 2024 conference, to be held in October 2024.

Participation in events

The project was disseminated at several major industry events.

Green Energy Expo & Romenvirotec

Between April 11-13, 2024, BEIA Consult International participated as an exhibitor at the Green Energy Expo & Romenvirotec, alongside national and international industry leaders. The company’s booth displayed posters and distributed leaflets with key information about the projects it is involved in, including the DAFCC project.

During the event, constructive discussions were held with potential beneficiaries and people interested in the company’s activities, including the DAFCC project. The company’s participation was shared on social media networks where it has an active presence (X, LinkedIn). Event website: https://greenenergyexpo-romenvirotec.ro/.

PoliFest

BEIA Consult International was present at PoliFest 2024, which took place in April 18-20, 2024. The event brought together many major players from various industrial and research sectors, both national and international. The team provided information to the visitors and answered their questions, the DAFCC project, among others, was disseminated. Discussions were initiated for potential collaborations with various interested individuals and organizations. The company’s participation was promoted on social media networks, highlighting its involvement in profile events. More information about the event: https://polifest.upb.ro/.

Electronics, Computers and Artificial Intelligence (ECAI)

At the Electronics, Computers and Artificial Intelligence conference, held June 27-28, 2024, the company presented two papers for its projects.

During the event, constructive discussions were held with potential national and international partners, exploring opportunities for future collaboration. For more information about the conference: https://ecai.ro/.

SISOM 2023

During this conference that took place in Bucharest on September 21-22, 2023, a presentation titled “How can Natural Language Processing help to improve Medical Assistance” was made. The authors of this presentation were Inge Gavăt, Svetlana Segărceanu and Andreea Griparis. This work was supported by UEFISCDI Romania and the Ministry of Research, Innovation and Digitalization through the DAFCC project. Conference website – https://imsar.ro/sisom_conf/.

GoTech World 2023

During this exhibition, BEIA Consult International participated as an exhibitor alongside national and international industry leaders. Posters were displayed on the stand and leaflets were distributed with key information about the projects the company is involved in, including the DAFCC project. Discussions were held with potential beneficiaries and people interested in the project activities. The company’s participation was highlighted on the social media networks where it has a presence (X, LinkedIn, Facebook). Website of the event – https://www.gotech.world/.