Comparing Graph Analytic Approach To Analyze Enterprise Architecture Models

Introduction

An Enterprise is one complex system which comprises human,software,hardware and technology components. In Enterprise, it is difficult for the single person to understand how the things are working together. So there is a Enterprise architecture(EA) which creates a visual representation of enterprise at different levels and also to create a road-map. Enterprise architecture refers to the interaction between the IT components and business processes or activities of a corporation. The purpose of EA is to define the structure and operation of an organization.

The EA models are used to communicate information regarding EA between stakeholders and different layers of EA namely, enterprise’s business, information, and technology environments. Enterprise architecture management(EAM) is used to plan, develop, understand and control an organization’s architecture.

EAM is a set of practices that help to improve quality of decision making. Archimate is an open and independent modeling language for enterprise architecture which describes construction and operation of business process, information flows, organizational standards. The ArchiMate language helps Enterprise Architects to describe, analyze, and visualize the relationships among architecture domains. Fig1 shows the Archimate framework. Archi is a open-source visual-modelling and design tool for building ArchiMate models and modelling sketches.

Motivation

The main motivation behind this thesis work is to improve quality in EAM. Since components in the repository represents vertices and relationship between components represent edges in the graph, it is an efficient way to analyze EA models as network graph. Graph analytic methods can provide a deeper understanding about the relation between the components. In general, there is a possibility of polluting EA model repository by adding duplicate models or almost similar models into the repository. This causes unnecessary replications
of the components and in-turn expands the repository. The possible solution is to evaluate EA models in order to find similar components in a repository by applying machine learning and graph analytic/social network analysis methods and exploring insights from the models present in the repository. Mining such EA models can provide graph based recommendation or decision-making system before adding models into the repository.

Research problem/question

In the EAM, Knowledge decision making is important to understand the structure and relationship between the components. Decision-making may be complex in large organizations dealing with many stakes and situations. The main problem is to extract insights from the component present in the repository by minimizing the duplicates before adding a model to the repository. Increase in the size of the repository becomes challenging in-order to match graphs based on proximity measure. Dealing with graph data requires some effort to analyze and preprocessing the data before applying graph analytic methods. It is also important to compare graph analytic approach in order to validate certain algorithms or methods to obtain better results.

Related work

The work of Vasil presents how the machine learning techniques can be used to evaluate enterprise architecture models. The focus is mainly on using unsupervised learning methods to find patterns in unlabelled data and finding feature similarities.

First method is based on attribute based similarity where similarity is calculated between names, types and descriptions of a different models. And score is given between o and 1 which shows how similar or dissimilar the objects between two models. Figure 2 summarizes the feature similarity measure between
types, names and descriptions. Finally combining the separate feature similarities based on weighted similarity function.

Next method is using structural/topology similarity measure to compare models inside repository. This is nothing but finding similarity based on node position. SimRank is one such pairwise similarity measure which is used to find structural similarity. The main idea is two objects are said to be similar if they are referenced by similar objects. The drawback of this approach is it take longer time to compute similarities. The possible solution is to use community detection based algorithm or performing Random walks on graphs.

Last method is based on association rule mining. The idea is to convert archi models of same domain into transactions in order to successfully apply association rule mining techniques and obtaining the component of high interest. The approaches are evaluated for correctness.

Methodology

Prototype Archimate model is built for the reference model repository using Archi open source EA modelling tool. Model is exported to CSV format where elements and relations are saved in a separate file. It is convenient to explore EA models as a network graph since it represents relation between the elements
in a model. To satisfy the objective of holistic knowledge representation and decision making, it is necessary to mine the graph structure data to extract insights from the models. Gephi or R tool can be used to analyze the graph and compute basic graph statistics like node degree, Average path length, node
ranking etc. Also different machine learning algorithms can be used to predict the relationship between the components. Algorithms are then evaluated to prediction accuracy. Text similarity approach can be used to find how similar the names of the components in different architecture models.

The entire task can be divided into 4 stages. Figure 3 summarizes the steps.

Different modularity optimization methods and clustering algorithms like Markov Clustering, Louvain,walktrap, spectral clustering, and Infomap etc are used to evaluate communities in the graph structure. The decision of using specific method can be evaluated by highest modularity score.

The similarity measure between 2 Network can be calculated by using network parameters such as cluster coefficient. betweenness, node degree, shortest path… etc. The similarity measures like Jaccard index, Dice, invlogweighted(inverse log-weighted) calculates similarity scores for vertices based on their connection patterns.
In addition to the unsupervised method, semi-supervised algorithms are used to perform link prediction and node classification.

Conclusion

Analyzing EA model as a network graph is the efficient way to get insights from the archi model. Social network analysis methods can be used to extract the graph feature statistics. Graph analytic approaches can be used to compare models inside the repository and also to discover duplicate/almost similar models. Machine learning algorithms is used to predict links in the architecture models and evaluated to obtain best prediction accuracy. Comparing the results of different clustering or community detection algorithms and similarity metrics can provide the better understanding of the results obtained. Gephi/R tool can
be used to visualize the EA model and igraph R package supports most of the graph analytic methods.