R/neuronlistfh.R
neuronlistfh.Rd
neuronlistfh
objects consist of a list of neuron objects
along with an optional attached dataframe containing information about the
neurons. In contrast to neuronlist
objects the neurons are not
present in memory but are instead dynamically loaded from disk as required.
neuronlistfh
objects also inherit from neuronlist
and
therefore any appropriate methods e.g. plot3d.neuronlist
can also be
used on neuronlistfh
objects.
neuronlistfh
constructs a neuronlistfh object from a
filehash
, data.frame
and keyfilemap
. End users will
not typically use this function to make a neuronlistfh
.
They will usually read them using read.neuronlistfh
and sometimes
create them by using as.neuronlistfh
on a neuronlist
object.
is.neuronlistfh
test if an object is a neuronlistfh
as.neuronlistfh
generic function to convert an object to
neuronlistfh
as.neuronlistfh.neuronlist
converts a regular neuronlist
to one backed by a filehash object with an on disk representation
c.neuronlistfh
adds additional neurons from one or more
neuronlist objects to a neuronlistfh
object.
neuronlistfh(db, df, keyfilemap, hashmap = 1000L)
is.neuronlistfh(nl)
as.neuronlistfh(x, df, ...)
# S3 method for neuronlist
as.neuronlistfh(
x,
df = attr(x, "df"),
dbdir = NULL,
dbClass = c("RDS", "RDS2", "DB1"),
remote = NULL,
WriteObjects = c("yes", "no", "missing"),
...
)
# S3 method for neuronlistfh
c(..., recursive = FALSE)
a filehash
object that manages an on disk database of neuron
objects. See Implementation details.
Optional dataframe, where each row describes one neuron
A named character vector in which the elements are filenames on disk (managed by the filehash object) and the names are the keys used in R to refer to the neuron objects. Note that the keyfilemap defines the order of objects in the neuronlist and will be used to reorder the dataframe if necessary.
A logical indicating whether to add a hashed environment for rapid object lookup by name or an integer or an integer defining a threshold number of objects when this will happen (see Implementation details).
Object to test
Object to convert
Additional arguments for methods, eventually passed to
neuronlistfh()
constructor.
The path to the underlying filehash
database on disk. For
RDS formats, by convention this should be a path whose final element is
'data' which will be turned into a directory. For DB1 format it specifies
a single file to which objects will be written.
The filehash
database class. Defaults to RDS
.
The url pointing to a remote repository containing files for each neuron.
Whether to write objects to disk. Missing implies that
existing objects will not be overwritten. Default "yes"
.
currently ignored
a neuronlistfh
object which is a character vector
with
classes neuronlistfh, neuronlist
and attributes db, df
. See
Implementation details.
The recommended way to do this is by
using the c.neuronlistfh
method to append one or more neuronlists to
a neuronlistfh object. This ensures that the attached metadata for each
data.frame is handled properly. Use as nlfh <- c(nlfh, nl2)
. If you
want to combine two neuronlistfh
objects, it may make sense to
choose the bigger one as the first-listed argument to which additional
neurons are appended.
There is also low-level and quite basic support for modifying neuronlistfh
objects using the [[
operator. There are two modes depending on the
nature of the index in the assignment operation
nlfh[[index]]<-neuron
:
numeric index for replacement of items only
character index for replacement or addition of items
This distinction is because there must be a character key provided to name
the neuron when a new one is being added, whereas an existing element can
be referenced by position (i.e. the numeric index). Unfortunately the end
user is responsible for manually modifying the attached data.frame when new
neurons are added. Doing nlfh[[index]]<-neuron
will do the
equivalent of attr(nlfh, 'df')[i, ]=NA
i.e. add a row containing NA
values.
neuronlistfh objects are a hybrid between
regular neuronlist
objects that organise data and metadata for
collections of neurons and a backing filehash
object. Instead of
keeping objects in memory, they are always loaded from disk.
Although this sounds like it might be slow, for nearly all practical
purposes (e.g. plotting neurons) the time to read the neuron from disk is
small compared with the time to plot the neuron; the OS will cache repeated
reads of the same file. The benefits in memory and startup time (<1s vs
100s for our 16,000 neuron database) are vital for collections of 1000s of
neurons e.g. for dynamic report generation using knitr or for users with
<8Gb RAM or running 32 bit R.
neuronlistfh objects include:
attr("keyfilemap")
A named character vector that determines
the ordering of objects in the neuronlist and translates keys in R to
filenames on disk. For objects created by as.neuronlistfh
the
filenames will be the md5 hash of the object as calculated using
digest
. This design means that the same key can be used to refer to
multiple distinct objects on disk. Objects are effectively versioned by
their contents. So if an updated neuronlistfh object is posted to a website
and then fetched by a user it will result in the automated download of any
updated objects to which it refers.
attr("db")
The backing database - typically of class
filehashRDS
. This manages the loading of objects from disk.
attr(x,"df")
The data.frame of metadata which can be used to
select and plot neurons. See neuronlist
for examples.
codeattr(x,"hashmap") (Optional) a hashed environment which can be used for rapid lookup using key names (rather than numeric/logical indices). There is a space potential to pay for this redundant lookup method, but it is normally worth while given that the dataframe object is typically considerably larger. To give some numbers, the additional environment might occupy ~ 1 reduce mean lookup time from 0.5 ms to 1us. Having located the object, on my machine it can take as little as 0.1ms to load from disk, so these savings are relevant.
Presently only backing objects which extend the filehash
class are
supported (although in theory other backing objects could be added). These
include:
filehash RDS
filehash RDS2 (experimental)
filehash DB1 (experimental)
We have also implemented a simple remote access protocol (currently only
for the RDS
format). This allows a neuronlistfh object to be read
from a url and downloaded to a local path. Subsequent attempts to access
neurons stored in this list will result in automated download of the
requested neuron to the local cache.
An alternative backend, the experimental RDS2
format is supported
(available at https://github.com/jefferis/filehash). This is likely
to be the most effective for large (5,000-500,000) collections of neurons,
especially when using network filesystems (NFS
, AFP
) which
are typically very slow at listing large directories.
Finally the DB1 backend keeps the data in a single monolithic file on disk.
This may work better when there are many small neurons (think >10,000 files
occupying only a few GB) on NFS network file systems or Google Drive,
neither of which are keen on having many files especially in the same
folder. It does not allow updates from a remote location. See
filehashDB1-class
for more details.
Note that objects are stored in a filehash, which by definition does not
have any ordering of its elements. However neuronlist objects (like lists)
do have an ordering. Therefore the names of a neuronlistfh object are not
necessarily the same as the result of calling names()
on the
underlying filehash object.
Other neuronlistfh:
[.neuronlistfh()
,
read.neuronlistfh()
,
remotesync()
,
write.neuronlistfh()
Other neuronlist:
*.neuronlist()
,
is.neuronlist()
,
neuronlist-dataframe-methods
,
neuronlistz()
,
neuronlist()
,
nlapply()
,
read.neurons()
,
write.neurons()
if (FALSE) {
kcnl=read.neuronlistfh('http://jefferislab.org/si/nblast/flycircuit/kcs20.rds',
'path/to/my/project/folder')
# this will automatically download the neurons from the web the first time
# it is run
plot3d(kcnl)
kcfh <- as.neuronlistfh(kcs20[1:18])
# add more neurons
kcfh <- c(kcfh, kcs20[19], kcs20[20])
# convert back to regular (in memory) neuronlist
all.equal(as.neuronlist(kcfh), kcs20)
}
if (FALSE) {
# create neuronlistfh object backed by filehash with one file per neuron
# by convention we create a subfolder called data in which the objects live
kcs20fh=as.neuronlistfh(kcs20, dbdir='/path/to/my/kcdb/data')
plot3d(subset(kcs20fh,type=='gamma'))
# ... and, again by convention, save the neuronlisfh object next to filehash
# backing database
write.neuronlistfh(kcs20fh, file='/path/to/my/kcdb/kcdb.rds')
# in a new session
read.neuronlistfh("/path/to/my/kcdb/kcdb.rds")
plot3d(subset(kcs20fh, type=='gamma'))
# using the DB1 backing store (a single file on disk for all objects)
kcs20fh=as.neuronlistfh(kcs20, dbdir='/path/to/my/kcdb/kcs20fh')
# store metadata on disk
write.neuronlistfh(kcs20fh, file='/path/to/my/kcdb/kcs20fh.rds')
# read in again in a new session. You will need these two files
# kcs20fh kcs20fh.rds
kcs20fh2 <- read.neuronlistfh("/path/to/my/kcdb/kcs20fh.rds")
}