Motif sequence generation

MotifSequenceGeneratorModule
MotifSequenceGenerator

This module generates random sequences of motifs, under the constrain that the sequence has some total length ℓ so that q - δq ≤ ℓ ≤ q + δq. All main functionality is given by the function random_sequence.

MotifSequenceGenerator.random_sequenceFunction
random_sequence(motifs::Vector{M}, q, limits, translate, δq = 0; kwargs...)

Create a random sequence of motifs of type M, under the constraint that the sequence has "length" exactly within q - δq ≤ ℓ ≤ q + δq. Return the sequence itself as well as the sequence of indices of motifs used to create it. A vector of probabilities weights can be given as a keyword argument, which then dictates the sampling probability for each entry of motifs for the initial sequence created.

"length" here means an abstracted length defined by the struct M, based on the limits and translate functions. It does not refer to the amount of elements!

M can be anything, given the two functions

  • limits(motif) : Some function that given the motif it returns the (start, fine) of the the motif in the same units as q. This function establishes a measure of length, which simply is fine - start.
  • translate(motif, t) : Some function that given the motif it returns a new motif which is translated by t (either negative or positive), with respect to the same units as q.

Other Keywords

Please see the source code (use @which) for a full description of the algorithm.

  • tries = 5 : Up to how many initial random sequences are accepted.
  • taulcut = 2 : Up to how times an element is dropped from the initial guess.
  • summands = 3 : Up to how many motifs may be combined as a sum to complete a sequence.

Simple Example

This example illustrates how the module MotifSequenceGenerator works using a simple struct. For a more realistic, and much more complex example, see the example using music notes.


Let's say that we want to create a random sequence of "shouts", which are described by the struct

struct Shout
  shout::String
  start::Int
end

Let's first create a vector of shouts that will be used as the pool of possible motifs that will create the random sequence:

using Random
shouts = [Shout(uppercase(randstring(rand(3:5))), rand(1:100)) for k in 1:5]
5-element Vector{Main.Shout}:
 Main.Shout("WMU", 8)
 Main.Shout("3SQJ", 47)
 Main.Shout("EFW", 16)
 Main.Shout("BP25Z", 39)
 Main.Shout("OXP", 83)

Notice that at the moment the values of the .start field of Shout are irrelevant. MotifSequenceGenerator will translate all motifs to start point 0 while operating.

Now, to create a random sequence, we need to define two concepts:

shoutlimits(s::Shout) = (s.start, s.start + length(s.shout) + 1);

shouttranslate(s::Shout, n) = Shout(s.shout, s.start + n);
shouttranslate (generic function with 1 method)

This means that we accept that the temporal length of a Shout is length(s.shout) + 1.

We can now create random sequences of shouts that have total length of exactly q:

using MotifSequenceGenerator
q = 30
sequence, idxs = random_sequence(shouts, q, shoutlimits, shouttranslate)
sequence
6-element Vector{Main.Shout}:
 Main.Shout("3SQJ", 0)
 Main.Shout("WMU", 5)
 Main.Shout("BP25Z", 9)
 Main.Shout("3SQJ", 15)
 Main.Shout("OXP", 20)
 Main.Shout("BP25Z", 24)
sequence, idxs = random_sequence(shouts, q, shoutlimits, shouttranslate)
sequence
6-element Vector{Main.Shout}:
 Main.Shout("3SQJ", 0)
 Main.Shout("EFW", 5)
 Main.Shout("3SQJ", 9)
 Main.Shout("BP25Z", 14)
 Main.Shout("BP25Z", 20)
 Main.Shout("OXP", 26)

Notice that it is impossible to create a sequence of length e.g. 7 with the above pool. Doing random_sequence(shouts, 7, shoutlimits, shouttranslate) would throw an error.

Floating point lengths

The lengths of the motifs do not have to be integers. When using motifs with floating lengths, it is advised to give a non-0 δq to random_sequence. The following example modifies the Shout struct and shows how it can be done with floating length.

struct FloatShout
  shout::String
  dur::Float64
  start::Float64
end

rs(x) = uppercase(randstring(x))

shouts = [FloatShout(rs(rand(3:5)), rand()+1, rand()) for k in 1:5]
shoutlimits(s::FloatShout) = (s.start, s.start + s.dur);
shouttranslate(s::FloatShout, n) = FloatShout(s.shout, s.dur, s.start + n);

q = 10.0
δq = 1.0

r, s = random_sequence(shouts, q, shoutlimits, shouttranslate, δq)

r
7-element Vector{Main.FloatShout}:
 Main.FloatShout("KTX", 1.1978245713254463, 0.0)
 Main.FloatShout("N0ECN", 1.3535715317749588, 1.1978245713254463)
 Main.FloatShout("RQNPL", 1.2387061448437329, 2.551396103100405)
 Main.FloatShout("KTX", 1.1978245713254463, 3.790102247944138)
 Main.FloatShout("KGL", 1.72836212834809, 4.9879268192695845)
 Main.FloatShout("XI8", 1.8578670877760848, 6.716288947617675)
 Main.FloatShout("XI8", 1.8578670877760848, 8.57415603539376)
s
7-element Vector{Int64}:
 3
 1
 2
 3
 4
 5
 5