# Getting started¶

Gecos generates a color scheme by performing a Metropolis-Monte-Carlo optimization in color space. In short it means that the algorithm tries to assign colors to the symbols (e.g. amino acids), whose pairwise perceptual differences is proportional to the respective distances calculated from a substitution matrix.

There are dozens of different color spaces with RGB probably being the most common one. Despite its popularity, the RGB color space does not do well when it comes to perceptual uniformity: Changing an RGB color value by a particular amount does not result in a visual difference of the same amount. Due to this issue Gecos uses the CIE L*a*b* color space instead, that behaves perceptually approximately uniform. The color space consists of three components:

• L* - The lightness of the color. 0 is completely black and 100 is completely white.

• a* - The green-red component. Green is in the negative direction, red is in the positive direction.

• b* - The blue-yellow component. Blue is in the negative direction, yellow is in the positive direction.

While values for a* and b* are not limited in either direction, only a small space is displayable and hence can be converted into RGB colors. Consequently the optimization process is also restricted to the displayable subspace. The following plots show the displayable a*b* space at two different L* levels. The gray area consists of L*a*b* values that cannot be converted into RGB space.

## Installation¶

In order to use Gecos you need to have Python (at least 3.6) installed. Furthermore, the following Python packages are required:

• biotite

• numpy

• matplotlib

• scikit-image

If these prerequisites are met, Gecos is simply installed via

$pip install gecos  ## Basic usage¶ The most simple invocation is simply $ gecos


By default Gecos uses the BLOSUM62 matrix to generate a color scheme, which is printed to console. Alternatively, You can save the color scheme into a file via the -f option. The color scheme is printed in a Biotite compatible JSON format and will look something like this:

{
"name": "scheme",
"alphabet": ["A","C","D","E","F","G","H","I","K","L"
"M","N","P","Q","R","S","T","V","W","Y"],
"colors": {
"A": "#7c7b8b",
"C": "#17ebd9",
"D": "#740365",
"E": "#992651",
"F": "#f3df8c",
"G": "#140a1a",
"H": "#b41308",
"I": "#e8eafe",
"K": "#fe83aa",
"L": "#f0eee6",
"M": "#fcdbce",
"N": "#d0388b",
"P": "#ba82fd",
"Q": "#873429",
"R": "#fe7878",
"S": "#744759",
"T": "#4c5e53",
"V": "#afcbe0",
"W": "#d5e70b",
"Y": "#aa7e00"
}
}


The value of "name" is obviously the given name of the color scheme. It can be adjusted with the --name option. The "alphabet" maps to a list of symbols comprised by the alphabet the scheme is intended for. The most important field is "colors": It maps to a dictionary, where a color is assigned to each symbol of the alphabet. Even though Gecos assigns a color to all symbols in "alphabet", the format allows that "colors" assigns colors only to a subset of the symbols in alphabet.

Note

Although the format is compliant with the Biotite color scheme format, the Biotite amino acid alphabet contains additional symbols for the ambiguous amino acids and the stop codon. Hence incorporating a Gecos JSON file into the Biotite source code requires that the symbols "B", "Z", "X" and "*" are appended at the end of the "alphabet" value. Editing "colors" is not necessary.

As the color space was not restricted in any way, the generated color scheme contains the whole lightness range - from pitch-black to pure white. Alignments visualized with this color scheme look accordingly:

Although this scheme has a high contrast and the color differences are well aligned with the substitution matrix, such a wide lightness range is seldom intended. To constrain the lightness range, you can give Gecos a minimum and a maximum lightness level:

$gecos --lmin 60 --lmax 75 -f a_color_scheme.json  However, the minimum and the maximum lightness should not be too close, lest the contrast will be quite low. ## Color constraints¶ The a* and b* components can be restrained in the same way, to create a color scheme that is shifted into a certain hue. This can, for example, be used to create a color scheme for red-green deficient people. For this purpose the green region will be removed, i.e. a* starts at 0. In order to compensate for the lost contrast, the lightness range is increased: $ gecos --amin 0 --lmin 50 --lmax 80 -f no_green_scheme.json


Likewise the saturation range can be set. The saturation is the euclidean distance of the a*b* components to gray (0, 0):

$gecos --smin 30 --lmin 55 --lmax 75 -f saturated_scheme.json  Last but not least, you can constrain a symbol to a specfic L*a*b* color via the --constraint or -c option. The optimization will not change the color of constrained symbols. In the following example, we want alanine to be gray and tryptophane to be blue, both with a lightness of 70: $ gecos -c A 70 0 0 -c W 70 -10 -45 --lmin 60 --lmax 75 -f constrained_scheme.json


## Adjusting the contrast¶

Gecos’ optimization process contains an additional score that penalizes low contrast color conformations, i.e. average low distances between the symbols. This behavior can be customized by setting the --contrast option. When the value is 0, low contrast schemes are not penalized. The higher the value, the more the symbols are driven to the edges of the color space. A bit of experimentation is necessary to find an optimal value for this option. The following example creates a high contrast color scheme:

gecos --contrast 2000 --lmin 60 --lmax 75 -f high_contrast_scheme.json  Warning Use the --contrast parameter with caution. Increasing the contrast parameter also means, that the substitution matrix is weighted less strongly. Consequently, although a high contrast color scheme may look appealing, it also may not represent the similarity of symbols very well. ## Color space and scheme preview¶ You do not need to create an alignment yourself in order to evaluate a newly created color scheme. Gecos provides some visualization capabilities by itself, so you can directly discard a color scheme you do not like. At first, you can output your selected color space with the --show-space option. The additional --dry-run terminates the program after the color space has been displayed:  gecos --show-space --dry-run --smin 30 --lmin 60 --lmax 70


The plot is a 2D projection of the color space at a fixed lightness. The lightness value in the plot is the average of the --lmin and the --lmax value. The displayed lightness value can be customized with the --lightness option. The hole in the center of the plot is causes by the saturation constraint.

The --show-scheme option shows the symbol conformation in color space after the optimization. Again the plot is a 2D projection at a fixed lightness. The white area shows the allowed color space at the given lightness:

gecos --show-scheme --smin 30 --lmin 60 --lmax 70  Some symbols might seem to be outside of the allowed space, but remember that the white area is only the allowed space at the displayed lightness. The --show-example options shows an example multiple protein sequence alignment with the color scheme.  gecos --show-example --smin 30 --lmin 60 --lmax 70


Finally, you can plot the progression of the score, Gecos tries to minimize, during the course of the optimization. Note that a low score means a better color conformation.