OPUS 4 | Suchen

Toward the Digitalization of Auditing: Applying Machine Learning for Information Extraction from Invoices (2024)

Krieger, Felix Friedrich Anton

Artificial intelligence, most prominently in the form of machine learning, is shaping up to be one of the most transformational technologies of the 21st century. Auditors are among the professions forecasted to be the most affected by artificial intelligence, as the profession encompasses many highly structured and repetitive tasks. Automating such tasks would naturally increase the efficiency of financial statement audits. By allowing auditors to focus on higher value-added tasks, and the capability to analyze large volumes of data at a fracture of the time a human would need, artificial intelligence would also benefit the effectiveness of auditing. Despite these benefits, to this day, the actual adoption of artificial intelligence in the audit domain remains rather limited. The audit profession is highly regulated and has to consider requirements regarding, e.g. the application of professional standards, codes of conduct, and data protection obligations. Hence, the question arises of how audit firms can be supported in their efforts to adopt artificial intelligence and how machine learning systems can be designed to comply with the specific demands of the audit domain. The goal of this dissertation is to better understand the adoption of artificial intelligence in the audit domain and to actively support the adoption of artificial intelligence in auditing based on this understanding. To this end, we employ a mixture of research methods. On the one hand, the research presented here adopts a qualitative approach, examining the adoption of artificial intelligence and other advanced analytical technologies of the audit domain through taxonomy development and grounded theory. The findings of these studies inspire the second stream of work within this dissertation, which adopts a quantitative and design-oriented approach: It focuses on using machine learning to extract information from invoices for tests of details. Tests of details are essential substantive audit procedures used in nearly every audit. This dissertation proposes a new machine learning model architecture for information extraction from invoices, compares different machine learning models, and proposes design principles for machine learning pipelines for an audit application addressing the test of details through action design research.

A probabilistic view on transport and mixing in closed and open flows (2023)

Klünker, Anna

The computational analysis and the optimization of transport and mixing processes in fluid flows are of ongoing scientific interest. Transfer operator methods are powerful tools for the study of these processes in dynamical systems. The focus in this context has been mostly on closed dynamical systems and the main applications have been geophysical flows. In this thesis, the authors consider transport and mixing in closed flow systems and in open flow systems that mimic technical mixing devices. Via transfer operator methods, They study the coherent behavior in closed example systems including a turbulent Rayleigh-Bénard convection flow and consider the finite-time mixing of two fluids. They extend the transfer operator framework to specific open flows. In particular, they study time-periodic open flow systems with constant inflow and outflow of fluid particles and consider several example systems. In this case, the transfer operator is represented by a transition matrix of a time-homogeneous absorbing Markov chain restricted to finite transient states. The chaotic saddle and its stable and unstable manifolds organize the transport processes in open systems. The authors extract these structures directly from leading eigenvectors of the transition matrix. For a constant source of two fluids in different colors, the mass distribution in the mixer and its outlet region converges to an invariant mixing pattern. In parameter studies, they quantify the degree of mixing of the resulting patterns by several mixing measures. More recently, network-based methods that construct graphs on trajectories of fluid particles have been developed to study coherent behavior in fluid flow. They use a method based on diffusion maps to extract organizing structures in open example systems directly from trajectories of fluid particles and extend this method to describe the mixing of two types of fluids.

Detecting and Assessing Road Damages for Autonomous Driving Utilizing Conventional Vehicle Sensors (2021)

Kortmann, Felix

Environmental perception is one of the biggest challenges in autonomous driving to move inside complex traffic situations properly. Perceiving the road's condition is necessary to calculate the drivable space; in manual driving, this is realized by the human visual cortex. Enabling the vehicle to detect road conditions is a critical and complex task from many perspectives. The complexity lies on the one hand in the development of tools for detecting damage, ideally using sensors already installed in the vehicle, and on the other hand, in integrating detected damages into the autonomous driving task and thus into the subsystems of autonomous driving. High-Definition Feature Maps, for instance, should be prepared for mapping road damages, which includes online and in-vehicle implementation. Furthermore, the motion planning system should react based on the detected damages to increase driving comfort and safety actively. Road damage detection is essential, especially in areas with poor infrastructure, and should be integrated as early as possible to enable even less developed countries to reap the benefits of autonomous driving systems. Besides the application in autonomous driving, an up-to-date solution on assessing road conditions is likewise desirable for the infrastructure planning of municipalities and federal states to make optimal use of the limited resources available for maintaining infrastructure quality. Addressing the challenges mentioned above, the research approach of this work is pragmatic and problem-solving. In designing technical solutions for road damage detection, the researchers conduct applied research methods in engineering, including modeling, prototyping, and field studies. They utilize design science research to integrate road damages in an end-to-end concept for autonomous driving while drawing on previous knowledge, the application domain requirements, and expert workshops. This thesis provides various contributions to theory and practice. The investigators design two individual solutions to assess road conditions with existing vehicle sensor technology. The first solution is based on calculating the quarter-vehicle model utilizing the vehicle level sensor and an acceleration sensor. The novel model-based calculation measures the road elevation under the tires, enabling common vehicles to assess road conditions with standard hardware. The second solution utilizes images from front-facing vehicle cameras to detect road damages with deep neural networks. Despite other research in this area, the algorithms are designed to be applicable on edge devices in autonomous vehicles with limited computational resources while still delivering cutting-edge performance. In addition, the analyses of deep learning tools and the introduction of new data into training provide valuable opportunities for researchers in other application areas to develop deep learning algorithms to optimize detection performance and runtime. Besides detecting road damages, the authors provide novel algorithms for classifying the severity of road damages to deliver additional information for improved motion planning. Alongside the technical solutions, they address the lack of an end-to-end solution for road damages in autonomous driving by providing a concept that starts from data generation and ends with servicing the vehicle motion planning. This includes solutions for detecting road damages, assessing their severity, aggregating the data in the vehicle and a cloud platform, and making the data available via that platform to other vehicles. Fundamental limitations in this dissertation are due to boundaries in modeling. The pragmatic approach simplifies reality, which always distorts the degree of truth in the result.

Computing Efficient Data Summaries (2022)

Mair, Sebastian

Extracting meaningful representations of data is a fundamental problem in machine learning. Those representations can be viewed from two different perspectives. First, there is the representation of data in terms of the number of data points. Representative subsets that compactly summarize the data without superfluous redundancies help to reduce the data size. Those subsets allow for scaling existing learning algorithms up without approximating their solution. Second, there is the representation of every individual data point in terms of its dimensions. Often, not all dimensions carry meaningful information for the learning task, or the information is implicitly embedded in a low-dimensional subspace. A change of representation can also simplify important learning tasks such as density estimation and data generation. This thesis deals with the aforementioned views on data representation and contributes to them. The authors first focus on computing representative subsets for a matrix factorization technique called archetypal analysis and the setting of optimal experimental design. For these problems, they motivate and investigate the usability of the data boundary as a representative subset. The authors also present novel methods to efficiently compute the data boundary, even in kernel-induced feature spaces. Based on the coreset principle, they derive another representative subset for archetypal analysis, which provides additional theoretical guarantees on the approximation error. Empirical results confirm that all compact representations of data derived in this thesis perform significantly better than uniform subsets of data. In the second part of the thesis, the research group is concerned with efficient data representations for density estimation. The researchers analyze spatio-temporal problems, which arise, for example, in sports analytics, and demonstrate how to learn (contextual) probabilistic movement models of objects using trajectory data. Furthermore, they highlight issues of interpolating data in normalizing flows, a technique that changes the representation of data to follow a specific distribution. The authors show how to solve this issue and obtain more natural transitions on the example of image data.

Leveraging Enterprise Architecture for Data-Driven Business Model Innovation (2021)

Rashed, Faisal

Maximizing the value from data has become a key challenge for companies as it helps improve operations and decision making, enhances products and services, and, ultimately, leads to new business models. While enterprise architecture (EA) management and modeling have proven their value for IT-related projects, the support of enterprise architecture for data-driven business models (DDBMs) is a rather new and unexplored field. The research group argues that the current understanding of the intersection of data-driven business model innovation and enterprise architecture is incomplete because of five challenges that have not been addressed in existing research: (1) lack of knowledge of how companies design and realize data-driven business models from a process perspective, (2) lack of knowledge on the implementation phase of data-driven business models, (3) lack of knowledge on the potential support enterprise architecture modeling and management can provide to data-driven business model endeavors, (4) lack of knowledge on how enterprise architecture modeling and management support data-driven business model design and realization in practice, (5) lack of knowledge on how to deploy data-driven business models. The researchers address these challenges by examining how enterprise architecture modeling and management can benefit data-driven business model innovation. The mixed-method approach of this thesis draws on a systematic literature review, qualitative empirical research as well as the design science research paradigm. The investigators conducted a systematic literature search on data-driven business models and enterprise architecture. Considering the novelty of data-driven business models for academia and practice, they conducted explorative qualitative research to explain "why" and "how" companies embark on realizing data-driven business models. Throughout these studies, the primary data source was semi-structured interviews. In order to provide an artifact for DDBM innovation, the researchers developed a theory for design and action. The data-driven business model innovation artifact was inductively developed in two design iterations based on the design science paradigm and the design science research framework.

Supporting Therapy Success by Developing Predictive Models in E-Mental-Health (2021)

Bremer, Vincent

Mental health is an important factor in an individuals' life. Online-based interventions have been developed for the treatment of various mental disorders. During these interventions, a large amount of patient-specific data is gathered that can be utilized to increase treatment outcomes by informing decision-making processes of psychotherapists, experts in the field, and patients. The articles included in this dissertation focus on the analysis of such data collected in digital psychological treatments by using machine learning approaches. This dissertation utilizes various machine learning methods such as Bayesian models, regularization techniques, or decision trees to predict different psychological factors, such as mood or self-esteem, dropout of patients, or treatment outcomes and costs. These models are evaluated using a variety of performance metrics, for example, receiver operating characteristics curve, root mean square error, or specialized performance metrics for Bayesian inference. These types of analyses can support decision- making for psychologists and patients, which can, in turn, lead to better recommendations and subsequently to increased outcomes for patients and simultaneously more insight about the interplay between psychological factors. The analysis of user journey data has not yet been fully examined in the field of psychological research. A process for this endeavor is developed and a technical implementation is provided for the research community. The application of machine learning in this context is still in its infancy. Thus, another contribution is the exploration and application of machine learning techniques for the revelation of correlations between psychological factors or characteristics and treatment outcomes as well as their prediction. Additionally, economic factors are predicted to develop a process for treatment type recommendations. This approach can be utilized for finding the optimal treatment type for patients on an individual level considering predicted treatment outcomes and costs. By evaluating the predictive accuracy of multiple machine learning techniques based on various performance metrics, the importance of considering heterogeneity among patients' behavior and affect is highlighted in some articles. Furthermore, the potential of machine learning-based decision support systems in clinical practice has been examined from a psychotherapists' point of view.

Analysis of User Behavior (2020)

Boubekki, Ahcène

Online behaviors analysis consists of extracting patterns from server-logs. The works presented here were carried out within the "mBook" project which aimed to develop indicators of the quantity and quality of the learning process of pupils from their usage of an eponymous electronic textbook for History. In this thesis, the research group investigates several models that adopt different points of view on the data. The studied methods are either well established in the field of pattern mining or transferred from other fields of machine learning and data mining. The authors improve the performance of archetypal analysis in large dimensions and apply it to unveil correlations between visibility time of particular objects in the e-textbook and pupils' motivation. They present next two models based on mixtures of Markov chains. The first extracts users' weekly browsing patterns. The second is designed to process essions at a fine resolution, which is sine qua non to reveal the significance of scrolling behaviors. The authors also propose a new paradigm for online behaviors analysis that interprets sessions as trajectories within the page-graph. In this respect, they establish a general framework for the study of similarity measures between spatio-temporal trajectories, for which the study of sessions is a particular case. Finally, they construct two centroid-based clustering methods using neural networks and thus lay the foundations for unsupervised behaviors analysis using neural networks.

Robustness of Centrality Measures (2020)

Martin, Christoph

Technological development made it possible to store and process data on a scale not imaginable decades ago — a development that also includes network data. A particular characteristic of network data is that, unlike standard data, the objects of interest, called nodes, have relationships to (possibly all) other objects in the network. Collecting empirical data is often complicated and cumbersome, hence, the observed data are typically incomplete and might also contain other types of errors. Because of the interdependent structure of network data, these errors have a severe impact on network analysis methods. This cumulative dissertation is about the impact of erroneous network data on centrality measures, which are methods to assess the position of an object, for example a person, with respect to all other objects in a network. Existing studies have shown that even small errors can substantially alter these positions. The impact of errors on centrality measures is typically quantified using a concept called robustness. The articles included in this dissertation contribute to a better understanding of the robustness of centrality measures in several aspects. It is argued why the robustness needs to be estimated and a new method is proposed. This method allows researchers to estimate the robustness of a centrality measure in a specific network and can be used as a basis for decision making. The relationship between network properties and the robustness of centrality measures is analyzed. Experimental and analytical approaches show that centrality measures are often more robust in networks with a larger average degree. The study of the impact of non-random errors on the robustness suggests that centrality measures are often more robust if missing nodes are more likely to belong to the same community compared to missingness completely at random. For the development of imputation procedures based on machine learning techniques, a process for the evaluation of node embedding methods is proposed.

Analyzing paid search campaigns using keyword-level data and Bayesian statistics (2018)

Blask, Tobias-Benedikt

Online marketing, especially Paid Search Advertising, has become one of the most important paid media channels for companies to sell their products and services online. Despite being under intensive examination by a number of researchers for several years, this topic still offers interesting opportunities to contribute to the community, particularly because of its large economic impact and practical relevance as well as the detailed and widely unfiltered view of consumer behavior that such marketing offers. To provide answers to some of the important questions from advertisers in this context, the author present four papers in his thesis, in which he extends previous works on optimization topics such as click and conversion prediction. He applies and extends methods from other fields of research to specific problems in Paid Search. After a short introduction, the dissertation starts with a paper in which the authors illustrates a new method that helps advertisers to predict conversion probabilities in Paid Search using sparse keyword-level data. They address one of the central problems in Paid search advertising, which is optimizing own investments in this channel by placing bids in keyword auctions. In many cases, evaluations and decisions are made with extremely sparse data, although anecdotal evidence suggests that online marketing is a typical "Big Data" topic. In the developed algorithm presented in this paper, the authors use information such as the average time that users spend on the advertiser's website and bounce rates for every given keyword. This previously unused data set is shared between all keywords and used as prior knowledge in the proposed model. A modified version of this algorithm is now the core prediction engine in a productive Paid Search Bid Optimization System that calculates and places millions of bids every day for some of the most recognized retailers and service providers in the German market. Next, the author illustrates the development of a non-reactive experimental method for A/B testing of Paid Search Advertising activities. In that paper, the authors provide an answer to the question of whether and under what circumstances it makes economic sense for brand owners to pay for Paid Search ads for their own brand keywords in Google AdWords auctions. Finally, the author presents two consecutive papers with the same theoretical foundation in which he applies Bayesian methods to evaluate the impact of specific text features in Paid Search Advertisements.

Open Access

Filtern

Autor

Erscheinungsjahr

Volltext vorhanden

Schlagworte

Institut

9 Treffer