Even when dozing off, we are still able to mumble »Set the alarm for 6.30 am« in the direction of our smartphone. The response from the calmingly electronic voice gives us the certainty we need to fall asleep peacefully. Hidden behind the scenes of this every-day exchange are complex data analysis methods. A combination of natural language generation (NLG), natural language processing (NLP), and machine learning (ML) conjures up an intelligent assistance for all occasions.
Today, we benefit from data analysis methods in almost all areas of life – not just in our private lives, but also in industry. For example, when a new product is to be launched on the market, a great deal of data with important, mostly implicit insights into the product itself along with the associated manufacturing processes is being generated from the very beginning of its life cycle. Having employees in product development evaluate it would be a daunting task. This is where machine learning methods come into picture. Such methods evaluate the existing data automatically. Employees can use the resulting findings to optimize manufacturing processes, solve problems more quickly, as well as improve overall product quality. The possibilities are practically endless.
In Cockpit 4.0, a collaborative project with Rolls Royce Germany, researchers at the Fraunhofer Institute for Production Systems and Design Technology IPK developed a virtual decision support system that uses ML methods to derive useful insights for solving manufacturing and assembly problems. For example, if a turbine part is missing during the assembly process, a team can be tasked with locating it. This pre-vents delays in the assembly process. For this purpose, three ML approaches were tested using the available data:
1. Natural Language Processing
Natural Language Processing (NLP) is a machine learning method for processing natural language. In the project, NLP was used to identify matches with previous descriptions of problems in order to allow inferences to be drawn about current cases. A distinction is made between automatically and self-trained models, whereby the latter are trained independently by the researchers using specific vocabulary.
Three different NLP algorithms were used:
a bag-of-words model, a self-trained Word2Vec model, and an automatically trained Word2Vec model. In the bag-of-words model, the individual terms of previous problem descriptions were evaluated without taking grammar into consideration, and compared against terms from current cases. The Jaccard coefficient was used to measure similarity. The second NLP algorithm was trained by the researchers themselves. It is based on a Word2Vec model that vectorizes terms, i.e., classifies them according to previous occurrences. The second Word2Vec model was trained automatically with the aid of Google News. For both Word2Vec models, Word Mover’s Distance was used to measure similarity.
In the evaluation, the self-trained model showed the best performance, while the model trained using Google News exhibited the lowest accuracy. This is due to the fact that independent training, i.e., entering industry-specific terms and abbreviations in the model, leads to more precise results. The results clearly demonstrate that training the models independently is beneficial for accurate matches.
2. Regression-based approach
The regression-based approach is frequently used to predict results based on variables. In Cockpit 4.0, it was adapted to generate a model that can predict the processing time required for an incoming case or problem.
This information can help in prioritizing problems. In the event of time-consuming cases, appropriate measures can be taken to shorten the period of time until the problem is resolved. The start and end dates of cases are generated to determine the duration of each case using supervised learning, i.e., data processing with predefined parameters. Four types of regression models were trained, namely linear regression, lasso regression, a support vector regression model (SVM), and a model tree with SVM. The model tree with support vector regression models achieved the lowest error rate.
Clustering is another machine learning method which can be used to assign different data points to specific groups. In the project, the clustering approach was used to subdivide the data sets into clusters in order to identify similarity patterns. The clusters can provide the user with similar cases via unsupervised learning, which then serve as the starting point for their research in solving the current problem. Three types of clustering algorithms were used, namely DBSCAN, HDBSCAN, and k-means. k-means clustering achieved the best qualitative results.
With the solution developed at Fraunhofer IPK, the researchers succeeded in demonstrating that companies can use data sets for the long-term improvement of development and production processes through machine learning approaches. At the end of the Cockpit 4.0 project, a prototype was created employing these three methods, and was extensively tested and evaluated by end users at Rolls Royce. Above all, they cited the time saved when solving assembly problems as the main benefit. At the same time, it also became clear that the virtual decision support system still needs to be further developed and refined in order to be seamlessly integrated into daily work processes. Such improvements could be subjects of future research projects, which could also benefit other partner companies in the industry.