
The matching issue
The basic idea on which my approach is based is very simple. The relations I consider inside each domain are measures of dissimilarity. Two musical segments or two images are more or less similar and this can be measured. Music and images are then considered as forms and forms are mapped. More particularly, I consider that in each domain the objects (musical segments, images) are represented by points in multidimensional spaces. A piece of music – being a sequence of segments – is then represented by a broken line; to map images on the piece of music, a similar path has to be found in the space of images (which could have a different number of dimensions). What is meant here by “similar” requires clarification. A mapping is regarded as good when the distances between the images mapped onto the musical segments are close to the distances of the corresponding segments. This requires of course some normalization process, as the units on the different axes are sometimes arbitrary and so are the distances.
The figure above illustrates in 2-dimensional spaces the mapping issue, the broken line on the left being the target (let us say they represent the segment of a musical piece) on which a subset of points (representing images) from the right box must be mapped.
Mathematical formulation
In mathematical terms, the problem to tackle can be phrased as follows – in a slightly more general way that does no require elements to be seen as points in multidimensional spaces.
Let us call C a set {c1, c2, c3 … cN} of entities on which a dissimilarity d has been defined. Typically C describes the target set (a set of musical segments for instance) on which elements from another (source) space (a set of images) are mapped.
S denotes the source space, which belongs to another sensory field. Its size is higher that the size of C (|S| > N). A dissimilarity d’ is defined on S.
The mapping is an injective function f from C to S. The distance between C and f(C) is defined in the following way:
λ is the scaling parameter required to compare the dissimilarities in C and S. Other ways to rescale could be considered but this would not impact the methods that are presented.
How to find f that minimizes Δ? In most cases, an exhaustive search is not possible. For instance, if a piece of music is cut into 20 segments and the source library of images contains 100 images, 1.3 x 1039 different possibilities would have to be examined.