“Oh, but nonsense, she thought; William must marry Lily. They have so many things in common. Lily is so fond of flowers. They are both cold and aloof and rather self-sufficing. She must arrange for them to take a long walk together.” (Virginia Woolf, To The Lighthouse)

# Neuron Matching Making

Insect brains seem pretty stereotyped. But just how stereotyped are they? It comes as a surprise to many neuroscientists who work only on vertebrates, to learn that in insects, individual neurons can readily and reliably be re-found and identified across different members of the species. Perhaps even across species.

As of 2020, two large data sets for the vinegar fly, D. melanogaster, are available making it possible to look at the full morphology of ~25,000 neurons in two data sets. These data sets are the hemibrain and FAFB. However, neurons in FAFB have been semi-manually or manually reconstructed, making the automatic assignment of FAFB-hemibrain neuron matches non-trivial. In this R package we have built tools to enable users to record and deploy inter-dataset matches.

What use is this information? Matches could be used to look at morphological stereotypy, help find genetic lines that label neurons, help transfer information associated with on reconstructed to the same cell in a different brain, compare neuron connectivity between two brains, etc.

For example, by matching neurons up between the hemibrain and FAFB, we see that the numbers of cell_types within one ‘hemilineage’ (a set of neurons that are born and develop together) are comparable between these two different flies:

In order to match neurons, we make use of other natverse tools to ‘bridge’ data between two different brainspace, so they can be co-visualised (enabled by template brain and bridging registrations by Bogovic et al. 2019):

## Overview

In general, our interactive matching pipelines follow this workflow (this example is for fafb_matching):

## What You need

In order to use these tools you will need to have RStudio and to have installed the natverse. To use them to maximum effect, you will need to also have permission to access the FAFB CATMAID v14 project, although some neurons are available to be read by the public from Virtual Fly Brain’s CATMAID project for FAFB and should have the same unique skeleton ID numbers. Our pipeline function makes use of the Google Filestream application, which should be installed on your machine. Further, note that neurons are read from the FAFB CATMAID project, and you must have login details for this project recorded in your .Renviron (edit with: usethis::edit_r_environ()) for these functions to work. For help, see here and here.

### Authorisation

In order to write neuron matches to the project you must have access to the hemibrain Google Drive or the match making Google sheet (see below) owned by the Drosophila Connectomics Group. If you do not have access but would like to help or use this information, get in contact! You do not need programming skills to to help us match make neurons, as we have written an interactive pipeline in R which does most of the work for you (see below).

We regularly also up-date a data frame saved in this package, as a snapshot of matches that have been made. Without authorisation you can access these matches but they may not be the most up-to-date:

# Load package
library(hemibrainr)
# See matches
View(hemibrain_matched)

We in the Drosophila Connectomics Group have been recording our match making in a Google sheet named em_matching. This sheet has two tabs of concern here, hemibrain for hemibrain neuron -> FAFB neuron matches and fafb for FAFB neuron to hemibrain neuron matches.

If you have authorisation, you can see the most up-to-date matches as so:

matches = hemibrain_matches() # You will be asked to log-in through a Google-enabled email address.
View(matches())

As you can see, other meta information is present in the data frame matches. The function hemibrain_matches has an argument called priority. This specifies whether to use FAFB->hemibrain_matches (FAFB) or hemibrain->FAFB matches (hemibrain) in order to ascribe cell_type names to FAFB neurons. In both cases, cell_type names are attached to hemibrain bodyids, and propagated to their FAFB matches.

## Match Quality

Once a match is recorded, the user selects a quality for that match. There can be no match (none), a tract-only match (tract) a poor match (poor) an okay match (medium) or an exact match (good). As a rule of thumb, a poor match could be a neuron from a very similar same cell_type or a highly untraced neuron that may be the correct cell_type. An okay match should be a neuron that looks to be from the same morphological cell_type but there may be some discrepancies in its arbour. A good match is a neuron that corresponds well between FAFB and the hemibrain data. A tract only match just means that the matched neuron should share the same cell body fiber, and therefore same developmental ontogeny, even if the rest of its morphology is quite different.

It is very important to note that a match cannot be a match if neurons do not seem to share the same cell body fiber tract. Being in a different tract is a deal breaker.

Some good matches are striking. For example:

In the above case, the FAFB neuron has been quite extensively manually traced, meaning that these cells look very similar to one another.

Be aware that while neurons must share the same cell body fiber tract, these tracts can be a little off set. For example, this is also a good match:

If the soma is missing, it might be safer to note a match as ‘medium’.

You might also use medium if you have a nice looking match and suspect that there is a medium/large discrepancy because the FAFB neuron (here shown in red) is under-traced, such as:

Or:

Bear in mind that the hemibrain volume only covers ~1/4 of the fly mid-brain, so neurons are truncated (here hemibrain neuron in black) but we can still make matches for many of them:

A larger degree of under-tracing may lead you to assign a match as poor. In this case, you think the two neurons may be ‘the same isomorphic cell_type’ but you could be wrong. For example:

A poor match may also be made if you think there is a slight offset, possibly due to a registration issue:

Though in this case, choosing an even lesser-traced FAFB neuron may be better:

A poor match can be given even to very under-traced FAFB neurons:

And even fragments if you are convinced the morphology is unique enough (but be careful!):

So far we have matched up a few thousand neurons. About 25 thousand matches are possible because that is the number of reconstructed neurons in the hemibrain data set. You can help us (and yourself!) by adding matches to our database. There are two main ways of doing this:

You can add matches you have already made by your own means. For this, you will need to get a data frame into R (e.g. reading from a .csv file) that has three columns: bodyid, which contains the hemibrain neurons’ unique Body IDs, skid which has the skeleton IDs for FAFB CATMAID neurons, and quality which gives a qualitative assessment of match quality (see above). If in doubt, put poor.

made.matches = read.csv("my_matches.csv") # Must have the named columns: bodyid, skid, quality
hemibrain_matches(df = made.matches, direction = "both") # direction controls which tabs matches get written to

Sometimes you cannot add a match, as your neuron either does not exist in the first column of the hemibrain tab of our Google sheet or of the fafb sheet. In these cases, if you have a valid ID, you can either add it to the sheet manually, or programmatically so that all the right meta data is easily included:

# Add a mising FAFB projection neuron, so we can match it later:
hemibrain_matching_add(ids = "16", sheet = "fafb", User = "ASB")
## the sheet argument specifies the worksheet or 'tab' on the Google sheet we want to add to

## The Match Making Pipeline

You can also use our interactive pipeline to match neurons between hemibrain and FAFB. There are two version of this pipeline. One that takes hemibrain neurons from neuPrint, and tries to find the best match for each hemibrain neuron (hemibrain_matching) and one that takes FAFB neurons from CATMAID and tries to find the best hemibrain_match for those FAFB neurons (fafb_matching). There is also a third pipeline (LR_matching) that takes advantage of the fact that the FAFB data set has two intact hemispheres and invites you to match up FlyWire between the two hemispheres. Once matches are made, the result become available with hemibrain_matches.

The Google sheet is set up with limited number of users, each of whom have been assigned a number of neurons to match up. In order to add yourself as a user, simply open this Google Sheet in your browser and add your initials to neurons of your choosing on the rightmost column ‘Users’.

For a video tutorial, see here.

# install package to bridge neurons between FAFB14 and hemibrain space
if (!requireNamespace("remotes")) install.packages("remotes")
remotes::install_github('natverse/nat.jrcbrains')

# Match hemibrain neurons!
hemibrain_matching() # Automatically, you can choose a User ID and you are given neurons that have this ID in the User column on the Google Sheet.
hemibrain_matching(ids=c("674108632","739256609")) # Otherwise you can select specific IDs
hemibrain_matching(ids=c("674108632","739256609"), overwrite = TRUE) # If a match has already been made you can overwrite it
# Otherwise neurons that have already been given a match will not be shown in the pipeline.
hemibrain_matching(overwrite = "none") # We can also set the pipeline to 'overwrite' cases where 'none' and 'tract' are the given match quality, i.e. re-look at cases where a proper match could not be made.

# Match FAFB neurons!
fafb_matching()
fafb_matching(ids = "16") # Specify IDs
fafb_matching(ids = "16", overwrite = TRUE) # Overwrite
fafb_matching(ids = "16", overwrite = "none") # Re-look only if no proper match, or just a tract-only match, was found before.

When you run these functions you will enter an interactive pipeline in an rgl window. Prompts will be given to you in your R console and you can rotate and pan in the window to see neurons. The neuron selected for-matching is shown in blue (i.e. if using hemibrain_matching this will be a hemibrain neuron), and potential matches in red (i.e. if using hemibrain_matching these will be FAFB neurons). Potential matches are shown by NBLAST score (a measure of morphological similarity). Usually, for reasonably traced FAFB neurons, a good match appears in the top 10 hits.

## Re-setting and transferring

You can transfer matches between the hemibrain and fafb tabs, as well as refresh these tabs with the most up-to-date meta data using the following code. You will be manipulating the Google sheet for all users, so please use with caution. If in doubt, do not use.

# Add all hemibrain neurons to sheet
hemibrain_matching_rewrite()

# Add all FAFB neurons with lineage annotations to sheet
fafb_matching_rewrite()

# Transfer matches made in one sheet to the other
hemibrain_matching_transfers()

# Get hemibrain information into v14 CATMAID for matches, as annotations:
matches = hemibrain_matches()
matches = subset(matches, match.quality %in% c("good","medium","poor") & dataset == "FAFB")
skds = rownames(matches)
fafb_hemibrain_annotate(skds)

## Uses

One use we have already found for all of this match making, is to cross-identify neuron cell body fiber tracts and (hemi)lineages. This means that we now have the locations in FAFB for different known sets of cells. You can see seed planes for them here.