# Rapid information discovery

### From CYPHYNETS

## Topology of Data: Exploiting Minimal Geometry for Rapid Information Discovery

The increasing pervasiveness & accuracy of sensors, unprecedented automation of data collection, extremely cheap storage and rapid dissemination by communication networks have resulted in an explosion of data to deal with. In particular, advances in imaging and scanning techniques have resulted in massive data sets in several fields, from genome sequences, medical images and satellite imagery to multimedia, scans of molecular surfaces and environmental recordings [5]. Typical examples of such databases are the sets of measurements sampled from spatial structures, also known as point cloud data (See Figure 4). Examples of point cloud data extend from traditional probes of objects in 3D to a wide variety of high-dimensional sets generated by neuronal activity, network evolution, astronomical data, robotic sensing etc.

Hidden in these high-volume databases and data flows are some highly nonlinear properties and geometric structures that are easier to describe qualitatively than by quantitative methods. Moreover, even though these data sets are high-dimensional, they typically reside on sets of much lower dimension and non-trivial topology. Traditional statistical methods, data-mining and machine-learning techniques are unable to decipher this information due to the curse of dimensionality, linearity assumptions and non-incorporation of global topological information. Therefore, new and efficient computational techniques that employ dimensionality reduction and topology discovery hold great promises for resolving outstanding scientific problems and generating new applications by rapid information discovery [4], [5]. These techniques can be grouped broadly under the subject of computational topology

My interest in this subject was originally motivated by the sets of data generated by complex networked systems described in the previous section. My thesis was one of the earliest research works on networked sensing using this approach. I am especially interested in pushing the research frontier (cf. Figure 1) in networked systems in all directions of complexity by asking the following question. Can complicated nonlinear dynamics, whether individual or collective, be described more effectively and robustly by global qualitative measures? Can classical notions of topological and symbolic dynamics be augmented with computable descriptions of feedback, stability, uncertainty and most importantly, undetermined input or control. Another source of interest for me is point-cloud data sets in sensory data generated in robotics [4]. Can massive measurement sets from laser-scans, point probes or sonar echoes be assembled, parsed and abstracted for motion planning, environmental modeling, localization and mapping etc?

Topological techniques for information discovery share two prominent characteristics: the inference of high dimensional properties from low dimensional projections and secondly, the emergence of global continuous information by assembling discrete local properties [6]. In my postdoctoral work, I have already explored both these questions for particular settings. For example, how can certain operators on combinatorial spaces be refined to give an analogous meaning for the smooth setting? How do we interpret and visualize ‘flows’ on discrete spaces in a fashion similar to continuous spaces? And most importantly, how do these approximations obey global topological obstructions? While such questions are sometimes answerable for particular situations, generalizing these results is an important research direction towards realizing discrete versions of differentiable calculus, geometry and mechanics. Such discrete and computable calculi have been recognized by researchers in many research areas as important for future applications of electromagnetics, fluid dynamics, computer graphics, data visualization and computer vision.## References

[4]. Sensor Topology for Minimal Planning (SToMP), DARPA-DSO program. [1]

[5]. Topological Data Analysis (TDA), DARPA-DSO program. [2]

[6]. Millennium Prize Problems in Mathematics, Clay Mathematics Institute / American Mathematical Society, 2000. [3]