OPUS 4 | Suchen

2 Treffer

1 bis 2

Sortieren nach

Computing Efficient Data Summaries (2022)

Mair, Sebastian

Extracting meaningful representations of data is a fundamental problem in machine learning. Those representations can be viewed from two different perspectives. First, there is the representation of data in terms of the number of data points. Representative subsets that compactly summarize the data without superfluous redundancies help to reduce the data size. Those subsets allow for scaling existing learning algorithms up without approximating their solution. Second, there is the representation of every individual data point in terms of its dimensions. Often, not all dimensions carry meaningful information for the learning task, or the information is implicitly embedded in a low-dimensional subspace. A change of representation can also simplify important learning tasks such as density estimation and data generation. This thesis deals with the aforementioned views on data representation and contributes to them. The authors first focus on computing representative subsets for a matrix factorization technique called archetypal analysis and the setting of optimal experimental design. For these problems, they motivate and investigate the usability of the data boundary as a representative subset. The authors also present novel methods to efficiently compute the data boundary, even in kernel-induced feature spaces. Based on the coreset principle, they derive another representative subset for archetypal analysis, which provides additional theoretical guarantees on the approximation error. Empirical results confirm that all compact representations of data derived in this thesis perform significantly better than uniform subsets of data. In the second part of the thesis, the research group is concerned with efficient data representations for density estimation. The researchers analyze spatio-temporal problems, which arise, for example, in sports analytics, and demonstrate how to learn (contextual) probabilistic movement models of objects using trajectory data. Furthermore, they highlight issues of interpolating data in normalizing flows, a technique that changes the representation of data to follow a specific distribution. The authors show how to solve this issue and obtain more natural transitions on the example of image data.

Analysis of User Behavior (2020)

Boubekki, Ahcène

Online behaviors analysis consists of extracting patterns from server-logs. The works presented here were carried out within the "mBook" project which aimed to develop indicators of the quantity and quality of the learning process of pupils from their usage of an eponymous electronic textbook for History. In this thesis, the research group investigates several models that adopt different points of view on the data. The studied methods are either well established in the field of pattern mining or transferred from other fields of machine learning and data mining. The authors improve the performance of archetypal analysis in large dimensions and apply it to unveil correlations between visibility time of particular objects in the e-textbook and pupils' motivation. They present next two models based on mixtures of Markov chains. The first extracts users' weekly browsing patterns. The second is designed to process essions at a fine resolution, which is sine qua non to reveal the significance of scrolling behaviors. The authors also propose a new paradigm for online behaviors analysis that interprets sessions as trajectories within the page-graph. In this respect, they establish a general framework for the study of similarity measures between spatio-temporal trajectories, for which the study of sessions is a particular case. Finally, they construct two centroid-based clustering methods using neural networks and thus lay the foundations for unsupervised behaviors analysis using neural networks.

1 bis 2

Autor(en)
Titel
Weitere Person(en), z.B. Betreuer
Gutachter
Zusammenfassung
Volltext

Open Access

Filtern

Autor

Erscheinungsjahr

Schlagworte

Institut

2 Treffer