Fault Prevention Ecosystem for 5G Mission Critical Services
5G (ENCQOR) Technology Development Challenge
Fault Prevention Ecosystem for 5G Mission Critical Services
Challenge Launch Date
May 24, 2019
Challenge Deadline
September 6, 2019 (This call for projects has expired. Notices of interest are no longer accepted.)
Challenge Statement
Mission Critical (MC) Audio, Video and Data (X) services require 5G system to provide high availability services with low communication latencies. Edge cloud are important enablers for MCX services in such a way that the processing is close to the service point and thus reducing latencies of network transmissions. Reliability is critical for MCX services. It requires 5 and more numbers of 9s level of service uptime and in some cases, with zero tolerance of failures. This requirement challenges the traditional fault management system that detects and fixes failures, using some HA or FT schemes to ensure the service ability. In such a way, even though a service can be resumed from a failure, it cannot guarantee e.g. that there is no video frame lost or out of sequence in a remote surgery.
This research work will handle this challenge and explore proactive ways to handle faults and system failures and the goal is not only early detecting the signs of faults, but it also prevents potential infrastructure, enabling system and application faults from happening for MCX. The work includes 1) efficient monitoring, 2) fault learning, understanding, modeling and classification and 3) fault prevention ecosystem
Project Partner
Ericsson Canada Inc.
Timeline
2 years
Available funding
$60,000 ($30,000/year for the two Ph.D.)
Applicant Type
Quebec/Ontario based University (possibly selected through an RFP)
Location
The work can be completed jointly in University labs and Ericsson’s Montreal site
Project Details
The main scope includes:
- Monitoring system and data collection: Selecting monitoring system (tools) that monitors infrastructure, operating system, and application. The tools need to provide real-time and complete view of the system, yet they should be efficient in terms of overhead. Building an efficient data pipeline from collection to preprocessing and to storage.
- Fault modeling: Learning from selected MCX use case and studying/simulating the potential faults. Classifying faults and analyzing the characteristics of each type of faults. Building a fault learning model that learns historical faults and inferences the potential faults.
- Proactive fault handling solution: selecting and training models to detect/predict known and unknown faults. Based on type of the faults, automating fault handling solutions. Propose a fault handling and prevention ecosystem with the help of ML, AI and evolutionary methods. Defining the mature level of the fault prevention system and using the result to aid the MCX service provisioning.
- Testbed Implementation and Evaluation: Building distributed 5G edge cloud testbed that runs selected MCX use case. Test the propose solution and evaluate the performance.
Deliverables:
- A fault model that formally learns, defines, specifies and measures the severity-level of faults
- An automatic fault prediction and recovery framework
- A proactive fault prevention ecosystem
- A PoC code library for fault prevention ecosystem
- Demonstration the ecosystem with the testbed
- Technical reports
Project Goals/Outcomes
Outcomes of this project include the followings:
- A fault prevention ecosystem
- PoC that shows the fault prevention system.
- Scientific papers in reputable journal and international conferences
- Patents
Applicant Capabilities
2 researchers Ph.D. level with:
- Strong background in cloud computing and distributed system
- Good understanding of evolutionary algorithms
- Experiences in statistical modelling, machine learning, neural networks, data analytics, and reinforcement learning.
- Good programming skills in Python or R, hand-on knowledge of Docker, Container and Kubernetes
- Knowledge in edge computing, IoT or sensor networks is a plus
- Knowledge in Telecom management network FCAPS is a plus
Additional Information
- N/A