On the nature of modelling

What is a model?

Merriam-Webster gives many different definitions (from plastic figures to a person walking on a cat-walk) but in this post, we are interested in the following definition:

a system of postulates, data, and inferences presented as a mathematical description of an entity or state of affairs

Basically, models are a bunch of assumptions that represent a real system as a mathematical model¹. Most of the people reading this post are very familiar with building models, even if they don’t recognize them as models. The most widely used modelling tool out there is actually Microsoft Excel with roughly 750 million users globally. Excel is used to create representations of real systems in a grid format.

Microsoft Excel is a great tool, no question about that², but anyone who has created larger models with Excel knows the pain you end up in as soon as you start to force more dimensions than the basic two that the grid provides. For example, you can easily model a warehouse where columns are different products and the rows are time periods. You can also to some degree model warehouse resupplies, with functions inside the cells, but as soon as you add stochasticity (for instance demand that fluctuates in a random fashion), Excel becomes very difficult to use - if not impossible.

There are tools that extend the boundaries of Excel³, but even they eventually run into problems. The end-user can always utilize VBA to create macros but this is outside of many users' experience level.

What are modelling tools?

Modelling tools are pieces of software that are used to create, well, models. Software engineers and organizations (both commercial and academic) have spent a significant amount of engineering hours to create tools that can more easily create models. Technically, a proficient software engineer could build any model using any general-purpose programming language, but this would require a very significant amount of time. Listing all possible modelling tools would likely be futile, as there are so many different tools available and many of them are built for niche application areas. As such, we will only look into some examples as well as where these tools are used.

Spreadsheet software

As mentioned above, Excel is among the most widely used software tools available. Tools within the spreadsheet sphere are quite OS-specific (Excel, Numbers, LibreCalc) and their main purpose is very similar. Essentially, a user creates functions in cells inside a grid⁴ to calculate new values based on values in other cells. These tools can be used on pretty much any device and the tools are quite easy to use, but as soon as more accuracy is required, the ease of use starts to dip dramatically.

Statistical software

People with academic degrees tend to at least have some experience with using statistical software during their studies. As a teaching assistant, I did a fair amount of work with SPSS, but there are plenty of other software tools available⁵. Statistical software can be used on pretty much any problem domain to build statistical models with ease. A proficient user can get good results out of the tool, as long as they understand the assumptions behind the conducted tests. However, proper usage of these tools requires a strong background in statistics. Also, as the tools can be used to do any statistical analysis, they tend to end up having a long list of use cases behind a long drop-down menu.

Simulation modelling

Simulation modelling is an area in which I have extensive experience due to my dissertation on simulation-based decision support systems. In simulation modelling, the modeler builds a visual representation of a real system by utilizing one of the modelling paradigms (can also be non-visual but in these cases, other tools are used). Overall, the problem domain can be pretty much anything. Anylogic was my main tool of choice back in the day, but there are plenty of other tools available⁶. Simulation modelling tools are similar to statistical software in their area of use. The tools require a learning curve, where proficient usage requires quite a lot of experience and/or training. Technically, it is easy to build models but in order to create good models the modeler has to have enough experience in the field.

CAD software

CAD software are used to create various kinds of blueprints of different objects, whether floor layouts, machines, or even individual screws. These tools⁷ tend to be relatively generic in their own problem domain (making blueprints), but proficient use requires significant learning.

What are model-based tools?

Model-based tools are tools that contain a model at their core. These tools are usually built using other tools and they usually have a very limited sphere of use-cases. Some frequent use cases include:

A simulation model, where the user can modify some parameters to see how the system reacts to changes⁸
- A system where a statistical model is used to calculate a regression value on the fly
- An Excel file where the end-user can modify some cells to see how other cells change

The uniting factors between all of the tools are that they tend to have a very limited problem domain, they are relatively simple to use and provide accurate results.

Modelling triangles

triangle

If you have followed along to this point of the blog post, you may have noticed that three themes have been discussed: generalizability, accuracy, and ease of use. These three themes shape the cornerstones of a modelling triangle. The fascinating thing about the triangle is that there seems to be a constant struggle with trying to find the perfect balance between the three categories in question. If a single tool would exist that would excel in all three areas, the tool would likely dominate all others. Thus, for now, a good tool is one that is able to compromise in contrast to its respective use case, which makes the tool user-friendly.

There are three frequent use cases in the modelling triangle: specialist tools, embedded models, and hard-coded models. The specialist tools and embedded models are much more frequent than the hard-coded models, but these are also by far the most prevalent use cases.

Specialist tools

Triangle with green shade

The common factor amongst specialist tools is the high amount of use cases and the accuracy of the final results. However, these tools tend to be difficult to use. A good example is statistical and simulation software. In the hands of a competent modeler, these tools can be used to create great models. However, the biggest downside is the limited amount of engineering hours available. As even the best engineer has limited working hours, there is a clear limit to how many problems an engineer can properly model with their available time.

Embedded models

Triangle with green shade

Embedded models are model-based tools. An engineer can utilize a specialist tool to build a statistical model that is included in some sort of automated system. These models have good ease of use (in many cases they are fully automated) and tend to provide accurate results. The main issue with these models is the generalizability: they are built for specific use cases. As such, there is a very limited amount available in any system.

Hard-coded models

Triangle with green shde

Hard-coded models translate to very simple models that perform a single action. In automation systems, there is usually an alarm system in place, which sounds an alarm if a value is above or below a threshold. These models are very general and very easy to use, but the accuracy is extremely low. If my car would run on 4000 rpms when no one is pressing the gas pedal, I would likely want to hear the sound of an alarm, even if the normal value is at 6000 rpms. To summarize: context matters.

Compromises

What is clear from the three common use cases, there is a need to make a compromise. It seems that the most frequent use cases require the most extreme solutions, e.g. by maximizing two categories while leaving the third hanging. My professional guess is that it is easier to make these types of tools. You see, an engineer is willing to spend a significant amount of time learning a new tool and thus values accuracy over ease of use. When tools are created for other users, they instead need to be very simple to use. If an embedded model had the same learning curve as a specialist tool, no end-user would use them. If a hard-coded model would become more accurate, the options would be to make it more difficult to use or then to create a more specific model.

Factory Harmonizer

Triangle with green shade

For our flagship software solution, Factory Harmonizer, we at SimAnalytics have made a different compromise. As we want our tool to be used by operations, we want it to be easy to use. However, we want to have good coverage of various ML models, so we want the tool to be relatively general. We accept that our models are not as accurate as models created by an expert with a specialist tool. Yet, when looking at what is currently being offered on the market, we feel that our solution hence provides a unique tool for managing complexity in the process industry.

Footnotes:

(1) The Big Book of Simulation Modeling contains a great chapter on what a simulation model is. Most of the text is very valid for the rest of the post as well even if we don’t concentrate on simulation modeling. https://www.anylogic.com/blog/the-new-big-book-of-simulation-modeling/

(2) There is valid criticism against the safety of Excel. A good post about the issue can be found here: https://inflex.io/blog/whats-wrong-with-the-grid

(3) Examples of these tools include @Risk and Crystal Ball https://www.palisade.com/risk/ https://www.oracle.com/applications/crystalball/

(4) There are some non-grid-based tools for spreadsheet calculations, such as Inflex: https://inflex.io/

(5) https://en.wikipedia.org/wiki/List_of_statistical_software

(6) https://en.wikipedia.org/wiki/List_of_computer_simulation_software

(7) https://en.wikipedia.org/wiki/Computer-aided_design

(8) Plenty of examples available at: https://cloud.anylogic.com/