What is a model?
Merriam-Webster gives many different definitions (from plastic figures to a person walking on a cat-walk) but in this post, we are interested in the following definition:
a system of postulates, data, and inferences presented as a mathematical description of an entity or state of affairs
Basically, models are a bunch of assumptions that represent a real system as a mathematical model1. Most of the people reading this post are very familiar with building models, even if they don’t recognize them as models. The most widely used modeling tool out there is actually Microsoft Excel with roughly 750 million users globally. Excel is used to create representations of real systems in a grid format.
Microsoft Excel is a great tool, no question about that2, but anyone who has created larger models with Excel knows the pain you end up in as soon as you start to force more dimensions than the basic two that the grid provides. For example, you can easily model a warehouse where columns are different products and the rows are time periods. You can also to some degree model warehouse resupplies, with functions inside the cells, but as soon as you add stochasticity (for instance demand that fluctuates in a random fashion), Excel becomes very difficult to use - if not impossible.
There are tools that extend the boundaries of Excel3, but even they eventually run into problems. The end-user can always utilize VBA to create macros but this is outside of many users' experience level.
What are modeling tools?
Modeling tools are pieces of software that are used to create, well, models. Software engineers and organizations (both commercial and academic) have spent a significant amount of engineering hours to create tools that can more easily create models. Technically, a proficient software engineer could build any model using any general-purpose programming language, but this would require a very significant amount of time. Listing all possible modeling tools would likely be futile, as there are so many different tools available and many of them are built for niche application areas. As such, we will only look into some examples as well as where these tools are used.
As mentioned above, Excel is among the most widely used software tools available. Tools within the spreadsheet sphere are quite OS-specific (Excel, Numbers, LibreCalc) and their main purpose is very similar. Essentially, a user creates functions in cells inside a grid4 to calculate new values based on values in other cells. These tools can be used on pretty much any device and the tools are quite easy to use, but as soon as more accuracy is required, the ease of use starts to dip dramatically.
People with academic degrees tend to at least have some experience with using statistical software during their studies. As a teaching assistant, I did a fair amount of work with SPSS, but there are plenty of other software tools available5. Statistical software can be used on pretty much any problem domain to build statistical models with ease. A proficient user can get good results out of the tool, as long as they understand the assumptions behind the conducted tests. However, proper usage of these tools requires a strong background in statistics. Also, as the tools can be used to do any statistical analysis, they tend to end up having a long list of use cases behind a long drop-down menu.
Simulation modeling is an area in which I have extensive experience due to my dissertation on simulation-based decision support systems. In simulation modeling, the modeler builds a visual representation of a real system by utilizing one of the modeling paradigms (can also be non-visual but in these cases, other tools are used). Overall, the problem domain can be pretty much anything. Anylogic was my main tool of choice back in the day, but there are plenty of other tools available6. Simulation modeling tools are similar to statistical software in their area of use. The tools require a learning curve, where proficient usage requires quite a lot of experience and/or training. Technically, it is easy to build models but in order to create good models the modeler has to have enough experience in the field.
CAD software are used to create various kinds of blueprints of different objects, whether floor layouts, machines, or even individual screws. These tools7 tend to be relatively generic in their own problem domain (making blueprints), but proficient use requires significant learning.
What are model-based tools?
Model-based tools are tools that contain a model at their core. These tools are usually built using other tools and they usually have a very limited sphere of use-cases. Some frequent use cases include:
- A simulation model, where the user can modify some parameters to see how the system reacts to changes8
- A system where a statistical model is used to calculate a regression value on the fly
- An Excel file where the end-user can modify some cells to see how other cells change
The uniting factors between all of the tools are that they tend to have a very limited problem domain, they are relatively simple to use and provide accurate results.
If you have followed along to this point of the blog post, you may have noticed that three themes have been discussed: generalizability, accuracy, and ease of use. These three themes shape the cornerstones of a modeling triangle. The fascinating thing about the triangle is that there seems to be a constant struggle with trying to find the perfect balance between the three categories in question. If a single tool would exist that would excel in all three areas, the tool would likely dominate all others. Thus, for now, a good tool is one that is able to compromise in contrast to its respective use case, which makes the tool user-friendly.
There are three frequent use cases in the modeling triangle: specialist tools, embedded models, and hard-coded models. The specialist tools and embedded models are much more frequent than the hard-coded models, but these are also by far the most prevalent use cases.
The common factor amongst specialist tools is the high amount of use cases and the accuracy of the final results. However, these tools tend to be difficult to use. A good example is statistical and simulation software. In the hands of a competent modeler, these tools can be used to create great models. However, the biggest downside is the limited amount of engineering hours available. As even the best engineer has limited working hours, there is a clear limit to how many problems an engineer can properly model with their available time.
Embedded models are model-based tools. An engineer can utilize a specialist tool to build a statistical model that is included in some sort of automated system. These models have good ease of use (in many cases they are fully automated) and tend to provide accurate results. The main issue with these models is the generalizability: they are built for specific use cases. As such, there is a very limited amount available in any system.
Hard-coded models translate to very simple models that perform a single action. In automation systems, there is usually an alarm system in place, which sounds an alarm if a value is above or below a threshold. These models are very general and very easy to use, but the accuracy is extremely low. If my car would run on 4000 rpms when no one is pressing the gas pedal, I would likely want to hear the sound of an alarm, even if the normal value is at 6000 rpms. To summarize: context matters.
What is clear from the three common use cases, there is a need to make a compromise. It seems that the most frequent use cases require the most extreme solutions, e.g. by maximizing two categories while leaving the third hanging. My professional guess is that it is easier to make these types of tools. You see, an engineer is willing to spend a significant amount of time learning a new tool and thus values accuracy over ease of use. When tools are created for other users, they instead need to be very simple to use. If an embedded model had the same learning curve as a specialist tool, no end-user would use them. If a hard-coded model would become more accurate, the options would be to make it more difficult to use or then to create a more specific model.
For our flagship software solution, Factory Harmonizer, we at SimAnalytics have made a different compromise. As we want our tool to be used by operations, we want it to be easy to use. However, we want to have good coverage of various ML models, so we want the tool to be relatively general. We accept that our models are not as accurate as models created by an expert with a specialist tool. Yet, when looking at what is currently being offered on the market, we feel that our solution hence provides a unique tool for managing complexity in the process industry.
(1) The Big Book of Simulation Modeling contains a great chapter on what a simulation model is. Most of the text is very valid for the rest of the post as well even if we don’t concentrate on simulation modeling. https://www.anylogic.com/blog/the-new-big-book-of-simulation-modeling/
(2) There is valid criticism against the safety of Excel. A good post about the issue can be found here: https://inflex.io/blog/whats-wrong-with-the-grid
(4) There are some non-grid-based tools for spreadsheet calculations, such as Inflex: https://inflex.io/
(8) Plenty of examples available at: https://cloud.anylogic.com/