Schedule for Spring 2013 - Kyoto University Informatics Seminar

Schedule for Spring 2013

Informatics Seminar, Perspectives in Informatics 5
Unless noted otherwise, all talks 16:30 - 18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus

    • David Rappaport (Queen's), Triangles and T. rex: Digital technology in support of museum services, hosted by David Avis
    • Antoine Deza (McMaster), Colourful linear programming and simplicial depth, hosted by David Avis
    • Sanjeev Arora (Princeton), Is Machine Learning Tractable? — Three Vignettes, hosted by Kazuo Iwama
  • May 2 Thursday, regular replacement day for April 29 (Mon)
    • Alejandro Ribes (EDF R&D), Image Spectrometers for the scanning of fine-art paintings, hosted by Xuefeng Liang
  • May 9 Thursday, Irregular day
    • Mia Persson (Malmö University), A fast parallel algorithm for minimum-cost small integral flows, hosted by Kazuo Iwama
    • Stanko Trifkovic (Kyoto U), Spatial Distribution and Hidden Trees in Bitterlich’s Angle-Count Sampling
    • Marco Cuturi (Kyoto U), Mean Reversion with a Variance Threshold
    • Aapo Hyvärinen (University of Helsinki, ATR), Analysing brain waves by unsupervised learning methods, hosted by Akihiro Yamamoto
    • Xuefeng Liang (Kyoto U), Segmentation of multiple interdependent motions using potential surface
    • Adam Jatowt (Kyoto U), Estimating focus time of documents
    • Francois Le Gall (University of Tokyo), Matrix Multiplication and Graph Algorithms, hosted by Akihiro Yamamoto
    • Toby Hocking (Tokyo Institute of Technology), Learning penalties for change-point detection using max-margin interval regression. hosted by Marco Cuturi
    • Ryuhei Uehara (JAIST) , Common developments of different polyhedra, hosted by Xuefeng Liang
    • The lecture is moved to July 25.
    • Jean-Philippe Vert (Mines ParisTech, Institut Curie), Learning with structured sparsity in computational biology, hosted by Marco Cuturi
    • Teruko Takada (Graduate School of Business, Osaka City University), Robust big data analysis of financial bubbles, hosted by Akihiro Yamamoto and Marco Cuturi
    • Shinichi Nakajima (Nikon Corporation), Global Solution and Theoretical Guarantee of Variational Bayesian PCA, hosted by Marco Cuturi
  • July 25 Thursday, Irregular day
    • Sourav S Bhovmick (Nanyang Technological University), Towards In Silico Network-driven Combination Drug Therapy: A Big “Small” Data Problem, hosted by Adam Jatowt

Abstracts and Detailed Information

April 8

16:30 - 18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
David Rappaport, Queen's University
Title: Triangles and T. rex: Digital technology in support of museum services
Abstract: Research Casting International (RCI) is one of the world's largest providers of museum technical services, which include specimen preparation, specimen restoration, specimen casting, specimen molding, specimen mounting, exhibit fabrication and exhibit moving.

RCI has a large digital collection of its specimens. These 3D data files can be used to create models using 3D printing or CNC milling. The digital models are also useful for planning poses for museum exhibits. Since January 2011 we have been working with RCI on various projects providing software solutions related to their 3D data collection.

We have provided software for mesh compression, post processing of meshes to ensure required geometric properties for printing and milling, as well as ancillary programs to streamline the workflow for their technicians. Ongoing projects involve mesh simplification, and mesh partitioning in support of the CNC milling process.

I will describe our activities with a focus on technical issues that we have resolved.

April 15

16:30 - 18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Antoine Deza, McMaster University
Title: Colourful linear programming and simplicial depth
Abstract: We present recent results and open questions dealing with a generalization of linear programming introduced by Bárány and Onn in 1997, and the associated generalization of the Carathéodory's Theorem proven by Bárány in 1982. In particular, we present recent generalizations of the Colourful Carathéodory Theorem due to Arocha et al. and Holmsen et al. and our strengthening of these. We provide a lower bound for the colourful version of the simplicial depth improving the earlier bounds of Bárány and Matoušek and of Stephen and Thomas, and verify that the conjectured lower bound is tight for dimension 4. Computational approaches for small dimensions are discussed.
Slides

April 22

16:30 - 18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Sanjeev Arora, Princeton University
Title: Is Machine Learning Tractable? — Three Vignettes

Abstract: Many tasks in machine learning (especially unsupervised learning) are provably intractable: NP-complete or worse. Nevertheless, researchers have developed heuristic algorithms to try to solve these tasks in practice. In most cases, these algorithms are heuristics with no provable guarantees on their running time or on the quality of solutions they return. Can we change this state of affairs?

This talk will suggest that the answer is yes, and describe three of our recent works as illustration. a) A new algorithm for learning topic models. (It applies to Linear Dirichlet Allocations of Blei et al. and also to more general topic models. It provably works under some reasonable assumptions and in practice is up to 50 times faster than existing software like Mallet. It relies upon a new procedure for nonnegative matrix factorization.) b) What classifiers are worth learning? (Can theory illuminate the contentious question of what binary classifier to learn: SVM, Decision tree, etc.?) c) Provable ICA with unknown gaussian noise. (An algorithm to provably learn a “manifold” with small number of parameters but exponentially many “interesting regions.”)

The talk will be self-contained.

(Based upon joint works with Rong Ge, Ravi Kannan, Ankur Moitra, Sushant Sachdeva.)

May 2

This seminar takes place on Thursday
16:30 - 18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Alejandro Ribes, Department SINETICS, EDF R&D, France
Title: Image Spectrometers for the scanning of fine-art paintings
Abstract: The concept of a Spectral Image is fundamental to this presentation. Such an image contains a spectral reflectance per pixel (image element) instead of the traditional three values representing color. Acquiring a spectral image requires an image spectrometer, which can be considered as an advanced type of digital camera. A straightforward property of spectral images is that high-color fidelity is easily obtained. There are some applications where high-fidelity color is crucial. One clear example is when capturing an image of a fine-art painting in a museum. In this case, the people that will look at the image (in printed form or in a screen) are curators, art-historians or other professionals that are used to perceiving small subtleties in color. Furthermore applications such as the generation of underdrawings, virtual restoration of paintings or pigment identification are feasible using this acquisition technology.

In this presentation I will describe the technology and signal processing methods involved in spectral imaging. I will also present examples of systems used in actual museums. Most examples images, such as multispectral scan of the Mona Lisa by Leonardo da Vinci, come for the CRISATEL European project and from the Louvre Museum in Paris.

May 9

This seminar takes place on Thursday
16:30 - 18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Mia Persson Department of Computer Science, Malmö University, Sweden
Title: A fast parallel algorithm for minimum-cost small integral flows (joint work with Andrzej Lingas)
Abstract: We present a new approach to the minimum-cost integral flow problem for small values of the flow. It reduces the problem to the tests of simple multivariate polynomials over a finite field of characteristic two for non-identity with zero. In effect, we show that a minimum-cost flow of value $k$ in a network with $n$ vertices, a sink and a source, integral edge capacities and positive integral edge costs polynomially bounded in $n$ can be found by a randomized PRAM, with errors of exponentially small probability in $n,$ running in $O(k\log (kn)+\log^2 (kn))$ time and using $2^{k}(kn)^{O(1)}$ processors. Thus, in particular, for the minimum-cost flow of value $O(\log n),$ we obtain an $RNC^2$ algorithm.
Slides

May 13

16:30 - 18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Stanko Trifkovic Social Informatics, GSI
Title: Spatial Distribution and Hidden Trees in Bitterlich’s Angle-Count Sampling
Abstract: Bitterlich’s method which he called angle-count sampling goes by a variety of names, including horizontal point sampling. This method has been extensively studied in the past and is widely used in estimating relative basal-area of trees in forests. Although concerns regarding the use of angle-count sampling in highly diverse forests still remain, our understanding of influences caused by different spatial distributions of trees is still limited. This study presents sampling simulations applying angle-count sampling in virtual forests with regular, random and clustered spatial distributions of trees, and with normal or exponential distributions of tree diameters. The findings suggest that the counts with angle-count sampling fitting to a widely known Poisson distribution can be used to indicate uniformly random spatial distributions of trees. Indices based on the counts do not require any distances to be measured and are likely to be independent from tree diameter distribution or applied basal-area factor. Moreover, this study demonstrates that the clustering induces a significant increase in numbers of the “hidden trees”, which usually cause surveyors to move sideways to check whether they should be counted or not. Clustering of the trees is also likely to highly reduce a statistical confidence to basal-area estimates; the estimates are significantly less precise in clustered than in random or in regular populations. Therefore, indexing spatial distributions of trees in forests should be a regular practice in forest inventories. The Bitterlich’s angle-count method can be one of the practical choices.

Marco Cuturi Intelligence Science & Technology, GSI
Title: Mean Reversion with a Variance Threshold
Abstract: Starting from a sample path of a multivariate stochastic process, we study several techniques to isolate linear combinations of the variables with a maximal amount of mean reversion, while constraining the variance of the combination to be larger than a given threshold. We show that many of the optimization problems arising in this context can be solved exactly using semidefinite programming and a variant of the S-lemma. In finance, these methods can be used to isolate statistical arbitrage opportunities, i.e. mean reverting baskets with enough variance to overcome market friction. In a more general setting, mean reversion and its generalizations can also be used as a proxy for stationarity, while variance simply measures signal strength.

May 20

16:30 - 18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Aapo Hyvärinen University of Helsinki, ATR
Title: Analysing brain waves by unsupervised learning methods
Abstract: Unsupervised machine learning methods have been succesful in analysing EEG and MEG data (“brain waves”) in two ways. First, they can remove technical artifacts from the data, and second, they can even decompose brain activity into more understandable components. Such methods are of particular interest in analysing brain activity when there is no particular stimulation and the brain is more or less at rest. In this talk, I will present some basic results an how unsupervised learning methods can be uses to analyse such brain waves. The method used are centered around independent component analysis, further including Bayesian networks or structural equation models.

May 27

16:30 - 18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Xuefeng Liang Intelligence Science & Technology, GSI
Title: Segmentation of multiple interdependent motions using potential surface
Abstract: A challenge in motion segmentation is that different motions are often mixed up and interdependent in the real data. Since the 2D representation of the dependent motions is obscure, it leads the segmentation difficult. In this talk, we will discuss a motion segmentation and recovery method for complex scenarios. Theoretically speaking, 2D representation of interdependent motions cannot be well segmented in 2D space, we thus first transform the 2D motion vector field into the 3D potential surface. In which, different motions are placed onto different layers so that they can be segmented much easier. By applying the surface fitting, the potential surfaces of global and local motions are then estimated. Finally, the recovered motions are obtained by projecting the segmented potential surfaces back to motion field. Using potential surface makes our method able to deal with both independent and dependent, rigid and non-rigid motions without prior knowledge. We will also explore its application in video alignment and action recognition.

Adam Jatowt Social Informatics department
Title: Estimating focus time of documents
Abstract: Temporality is an important characteristic of text documents. While some documents are clearly atemporal, many can be mapped to certain time periods. In this talk, we will introduce the problem of estimating focus time of documents. Document focus time is defined as the time period to which the content of a document refers to and is considered as a complementary dimension to its creation time or timestamp. We will describe several estimators of focus time by utilizing external knowledge bases such as news article collections which contain explicit temporal references. We will then show the effectiveness of our methods on diverse datasets of documents about historical events in several countries and discuss how focus time can be combined with document timestamp for making possible issuing complex temporal queries in longitudinal document collections. The talk will be concluded by the reference to our research of automatically estimating focus time of images.

June 3

16:30 - 18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Francois Le Gall University of Tokyo
Title: Matrix Multiplication and Graph Algorithms
Abstract: In this talk I will describe how algorithms for matrix multiplication can be used to design asymptotically fast algorithms for several graph-theoretic tasks. I will start with the well-known application of algorithms for the usual matrix product of two real matrices, which has been the subject of intensive research since the discovery of the first truly subcubic-time algorithm by Strassen in 1969, to the problem of computing the all-pairs shortest paths in a graph. Then I will present other kinds of matrix products, and in particular matrix products over algebraic structures known as “semirings” (such as the Boolean matrix product and the distance matrix product) and explain how they can be applied to solve several graph-theoretical problems as well. Finally, I will describe new quantum algorithms for these matrix products over semirings, which are faster than the best known classical algorithms.

June 10

16:30 - 18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Toby Hocking Tokyo Institute of Technology
Title: Learning penalties for change-point detection using max-margin interval regression

Note: In this interactive talk I will require a volunteer from that audience to create a database of change-point annotations from which we will learn a penalty function!

Abstract: In segmentation models, the number of change-points is typically chosen using a penalized cost function. In this work, we propose to learn the penalty and its constants in databases of signals with weak change-point annotations. We propose a convex relaxation for the resulting interval regression problem, and solve it using accelerated proximal gradient methods. We show that this method achieves state-of-the-art change-point detection in a database of annotated DNA copy number profiles from neuroblastoma tumors.

June 17

16:30 - 18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Ryuhei Uehara, Japan Advanced Institute of Science and Technology
Title: Common developments of different polyhedra
Abstract: What do you imagine when you hear a word “development” or “net” of a polyhedron? A typical development would be a so-called “Latin-cross” that is obtained by cutting some edges of a cube. However, you can also obtain a tetrahedron from the Latin-cross by folding different lines. Including this one, you can obtain 23 different polyhedra from just one Latin-cross in 85 different folding ways. In this talk, I will demonstrate some polygons that can fold to two or more convex polyhedra, and give some related open problems.


July 1

16:30 - 18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Jean-Philippe Vert, Mines ParisTech, Institut Curie
Title: Learning with structured sparsity in computational biology
Abstract: Learning with structured sparsity has emerged as a powerful approach to estimate predictive models with complex or structured data. By defining specific non-smooth convex regularisers, one can encode specific prior knowledge into computationally efficient procedures. In this talk, I will describe several such approaches we have developed for applications in computational biology. Applications include the inference of prognostic models in cancer from genomic data, the detection of breakpoints in noisy DNA profiles, or the quantification of RNA isoforms from high-thoughput sequencing data.

July 8

16:30 - 18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Teruko Takada, Osaka City University
Title: Robust big data analysis of financial bubbles
Abstract: Reduction in the damage caused by financial bubbles is clearly a socially critical issue. However, the analysis of financial bubble is difficult in that we need to explain complex system given limited available information. We tackle this difficulty in the following two ways: (1) robust and efficient information extraction by developing new statistical methods and (2) increasing information itself by using big data such as high-frequency financial data and the SNS data from the web. Newly developed statistical methods and several new findings from the big data analysis will be presented in this talk. The possibility of designing the policy and system will also be discussed.

July 22

16:30 - 18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Shinichi Nakajima, Nikon Corporation
Title: Global Solution and Theoretical Guarantee of Variational Bayesian PCA
Abstract: For statistical models in which the rigorous Bayesian learning is computationally intractable, the variational Bayesian (VB) approximation is a good alternative with its efficient computation and automatic relevance determination (or model selection) property. This talk starts with a general explanation on the VB approximation, including the standard procedure to derive a tractable iterative algorithm, and its advantage over the MAP estimation. Then, I'll show our recent theoretical results on VB in the probabilistic PCA or matrix factorization with no missing entry. These results include an analytic-form global solution, bounds of the noise variance estimator, and a theoretical guarantee for perfect dimensionality recovery. The analytic-form solution not only provides a stable and fast algorithm for VB-PCA, but also forms a building block of efficient algorithms for more complicated models.

References: C. M. Bishop, Variational Principal Components. ICANN1999. http://research.microsoft.com/en-us/um/people/cmbishop/downloads/Bishop-VPCA-ICANN-99.pdf

S. Nakajima, M. Sugiyama, S. D. Babacan, Ryota Tomioka, “Global Analytic Solution of Fully-observed Variational Bayesian Matrix Factorization,” Journal of Machine Learning Research, vol.14, pp.1-37, 2013.

S. Nakajima, R. Tomioka, M. Sugiyama, S. D. Babacan, “Perfect Dimensionality Recovery by Variational Bayesian PCA,” Twenty-Sixth Annual Conference on Neural Information Processing Systems (NIPS2012).

S. Nakajima, M. Sugiyama, S. D. Babacan, “Variational Bayesian Sparse Additive Matrix Factorization,” Machine Learning (Special Issue of Selected Papers of ACML 2012), 2013.

July 25

This seminar takes place on Thursday
16:30 - 18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Sourav S Bhowmick, Nanyang Technological University
Title: Towards In Silico Network-driven Combination Drug Therapy: A Big “Small” Data Problem
Abstract: Interest in combination drug treatment arising from positive clinical experience in diseases, such as cancer, has sparked renewed research efforts in combination drugs. A key benefit of drug combinations is drug synergy where the overall therapeutic effect is greater than the sum of the individual drug effect. However, not all drug combinations produce such synergy. Hence, there is a need for a rational approach in developing synergistic drug combinations. Recently, several in silico approaches based on combinatorial optimization have been proposed for identifying drug combinations. However, these approaches are prohibitively expensive especially for problems with high degrees of freedom. Additionally, they are not synergistic and oblivious to off-target effects which play critical role in determining efficacy and safety of the drug combinations. This talk consists of two parts. First, for the benefit of the audience who are not familiar with biology, the fundamental biological concepts related to this talk will be introduced. Second, we give an overview of our journey towards the grand goal of designing and implementing a novel network-driven fast in silico framework for identifying drug combinations that are synergistic by analysing the molecular interactions at a systems level.