“We’re Building Something, Here, Detective, We’re Building It From Scratch. All The Pieces Matter.” (Lester, The Wire)

Reading lots of data with hemibrainr

In order to use this package to its fullest, you need to get hemibrainr to read large amounts of data stored on a Google drive. To do this, you have three options:

  1. Mount your Google drive using Google filestream
  2. Mount your Google drive using rclone
  3. Download the Google drive and save locally

The best option is to use google filestream. By default, this is what hemibrainr expects. However, you need a Google Workspace account (formerly G-Suite), which is a paid-for service. Without this, the best option is to use rclone. You can also download to an external hard drive and use that.

We have two Google team drives available for you to use, which contain similar data. One ("hemibrain") is for internal use by the Drosophila Connectomics Group. The other one ("hemibrainr") is shared with those who would like access. Contact us by email to request access.

hemibrainr_googledrive

Option 1: Google workspace, hemibrainr and R

Google drive is a data storage and synchronisation system. A Google team drive (shared drive) is a Google drive that is shared by a team of people, you can find them on your ‘shared drives’ in your Google authenticated account. Sharing and file ownership are managed for the entire team rather than one individual. Google Drive is a key component of Google Workspace, Google’s monthly subscription offering for businesses and organizations. As part of select Google Workspace plans, Drive offers unlimited storage, advanced file audit reporting, enhanced administration controls, and greater collaboration tools for teams. Google filestream allows users with access to a Google Workspace to mount their Google Drives like a hard-drive, to their local machines. It is this feature that the package hemibrain capitalises upon in order to store large amounts of fly connectomics related data, and enable users to quickly access it.

On the hemibrain Google team drive for the Drosophila Connectomics Group. If you do not have access to this team drive and would like to use it, to make the most our of hemibrainr, please get in contact. You will need top to have access to the drive and have Google filestream mounted. Then you will be able to:

  • Read thousands of pre-skeletonised flywire/hemibrain neurons from Google Drive
  • Read flywire/hemibrain NBLASTs and NBLASTs to hemibrain neurons
  • Read flywire/hemibrain neurons that are pre-transformed into a variety of brainspaces
  • Get processed hemibrain connectivity data, synapse positions
  • Get meta data for all flywire and hemibrain neurons
  • Get information on flywire/FAFB/hemibrain neuron-neuron matches between data sets and hemispheres.

You can also set up your own Google team drive in which to deposit data. We can help you get started, so get in contact. A starting point for creating the data are the .R scripts in /data-raw/ of the hemibrainr Github repository. You will need the file structure from hemibrainr:::hemibrainr_gdrive_structure.

Get your drive location

Let us have a look at the default hemibrainr settings, letting it to know where to find Google drive data:

# Load package
library(hemibrainr)

# Else, it wants to see it on the mounted team drive, here
options("Gdrive_hemibrain_data")

# Get hemibrain skeletons
db <- hemibrain_neurons(local=FALSE) 
## Will not work without google filestream

# You can also specify certain Google sheets
## Used for neuron-neuron matching
options("hemibrainr_matching_gsheet")

Set your drive location

If you have mounted your Google drives with Google file stream, you should be able to see something like this:

google_filestream

If you need to set hemibrainr to look at a new drive, use:

# Set a new Google drive, can be the team drive name or a path to the correct drive
hemibrainr_set_drive("hemibrainr")

# Now just get the name of your default team drive.
## This will be used to locate your team drive using the R package googledrive
hemibrainr_team_drive()

Option 2: rclone, hemibrainr and R

You can also use the freeware rclone to mount any Google drive, including your own or a shared/team drive. This does not depend on you have a paid-for Google workspace authorised account. Again, for this to work, you must have access to an appropriate google team drive for hemibrainr. Get in contact if you would like access, but lack it. One you have used rclone to mount your Google drive, you can interact with it like a normal file system.

Please do not delete anything on the Google drive! Or add anything to it unless you are sure.

Download rclone

First, download rclone for your operating system. You can also download from your system’s command line (e.g. from terminal):

# unix/macosx
curl https://rclone.org/install.sh | sudo bash

Configure rclone with hemibrainr Google drive

Once you have rclone installed, you will need to follow these instructions to configure rclone with a Google drive: this should either be the team drive ‘hemibrain’ or the team drive ‘hemibrainr’. For the defaults in this package to work, please name your remote drive ‘hemibrainr’.

Whilst configuring, you will be asked to supply some google credentials - you can proceed without doing this. However, I highly recommend obtaining your own google client id and client secret from the developer console associated with your preferred Google account. To do this, follow these detailed instructions. This allows you to operate with a higher API rate limit - if you proceed without you will be using the rclone credentials, which many users as operating with, and this might be slower.

In short, the configuration process in terminal is:

rclone config
# you will now be guided through the config on the command line:
No remotes found - make a new one
n) New remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
n/r/c/s/q> n
name> hemibrainr
Type of storage to configure.
Choose a number from below, or type in your own value
[snip]
XX / Google Drive
   \ "drive"
[snip]
Storage> drive
Google Application Client Id - leave blank normally.
client_id> 4246767787-52r215n6e0h9k0qpta8f4o6jkr7dgfha3.apps.googleusercontent.com #dummy example
Google Application Client Secret - leave blank normally.
client_secret> yiedGkPA-KLO_dqehctghgh #dummy example
Scope that rclone should use when requesting access from drive.
Choose a number from below, or type in your own value
 1 / Full access all files, excluding Application Data Folder.
   \ "drive"
 2 / Read-only access to file metadata and file contents.
   \ "drive.readonly"
   / Access to files created by rclone only.
 3 | These are visible in the drive website.
   | File authorization is revoked when the user deauthorizes the app.
   \ "drive.file"
   / Allows read and write access to the Application Data folder.
 4 | This is not visible in the drive website.
   \ "drive.appfolder"
   / Allows read-only access to file metadata but
 5 | does not allow any access to read or download file content.
   \ "drive.metadata.readonly"
scope> 1
ID of the root folder - leave blank normally.  Fill in to access "Computers" folders. (see docs).
root_folder_id> 
Service Account Credentials JSON file path - needed only if you want use SA instead of interactive login.
service_account_file>
Remote config
Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine or Y didn't work
y) Yes
n) No
y/n> y
If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth
Log in and authorize rclone for access
Waiting for code...
Got code
Configure this as a team drive?
y) Yes
n) No
y/n> y
--------------------
[remote] # these are the specific details of your chosen drive
client_id = 
client_secret = 
scope = drive
root_folder_id = 
service_account_file =
token = {"access_token":"XXX","token_type":"Bearer","refresh_token":"XXX","expiry":"2014-03-16T13:57:58.955387075Z"}
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y

Once you have done this, you can check it works using:

Use rclone-mounted Google drive with hemibrainr

Once you have installed and configured rclone, you can use it to mount the google team drive. To do this, you create an empty directory wherever you want, named whatever you like, and then rclone ‘turns’ this directory into your mounted Google drive.

You can mount with rclone from your system’s command line:

# unix/macosx
rclone mount hemibrainr: /path/to/local/mount

Try it somewhere like /Documents to see how it works. For hemibrainr there is an R function that will do the mounting for you in your working directory. So within R:

# mounts in working directory
hemibrainr_rclone()

# Now hemibrain neurons are read from this mount
db = hemibrain_neurons()
length(db)

# Specifically, from here
options("Gdrive_hemibrain_data")

# unmounts
hemibrainr_rclone_unmount()

# And now we are back to:
options("Gdrive_hemibrain_data")

Option 3: local data and hemibrainr

You can also download the contents of the drive and store it locally or on an external hard drive, and use that.

To get the folder structure hemibrainr expects, you can use:

hemibrainr:::hemibrainr_folder_structure()