neuPrint includes both an API which provides a range of queries as well as the option to send custom queries written in the Cypher query language of the Neo4j graph database.
It is probably worth making queries via the API if they will solve your problem. However custom queries offer maximum flexibility.
It is a great idea to wrap your queries in a function. This will make your code cleaner and more reusable. But do check the documentation to ensure that there isn’t already a neuprintr function that does the job. If there is something that almost looks correct, then consider asking us to adapt it!
We’ll start by giving a basic example query function. Most of the time you should just be able to edit this example without worrying about the details. The subsequent sections give you some information about those details.
This function takes one or more bodyids as input and returns the name of the neurons.
neuprint_get_neuron_names <- function(bodyids, dataset = NULL, all_segments = FALSE,
conn = NULL, ...) {
bodyids = neuprint_ids(bodyids, conn = conn, dataset = dataset)
all_segments.json = ifelse(all_segments,"Segment", "Neuron")
cypher = sprintf("WITH %s AS bodyIds UNWIND bodyIds AS bodyId MATCH (n:`%s`) WHERE n.bodyId=bodyId RETURN n.instance AS name",
id2json(bodyids),
all_segments.json)
nc = neuprint_fetch_custom(cypher=cypher, conn = conn, dataset = dataset, ...)
d = unlist(lapply(nc$data,nullToNA))
names(d) = bodyids
d
}
bodyids
is the main input argument. These typically come
in as either character vector or numeric format. By passing the argument
to the neuprint_ids()
function you can also enable a range
of queries to define the bodyids. The internal function
id2json()
is eventually used to look after formatting them
appropriately for the Neo4j cypher query.
In this function, cypher
is the actual query written in
the Cypher
query language. One helpful tip. You can press the i key in
neuPrint explorer to reveal the cypher query!
Many functions that operate on neurons will have an argument
controlling whether they operate only for larger objects (aka
Neuron) or also on fragments (aka Segment).
Restricting queries to Neuron can result in big speed-ups in
some cases. You can provide an all_segments
argument to
handle this option.
Finally the results that come back from
neuprint_fetch_custom
are typically in a big list object.
For simple results, you can just unlist()
this to make a
vector. In this case a function nullToNA
is first applied
in order to ensure that any values that come back as NULL
are converted to NA
; this is necessary to ensure that you
can make a vector - vector objects can only contain NA
s not
NULL
s.
As you can see this step uses the sprintf
function to
interpolate variables into a string. You could do this in other ways
(e.g. the paste()
function or the glue()
package). You may need to watch out for quoting issues if you should
need to use single or double quotes inside your queries.
You need to be a little careful with the handling of body ids. The
basic advice is to pass them to neuprint_ids()
on entry to
your function. This will ensure that there is at least one id (unless
mustWork=FALSE
) and convert to character vector. It also
allows simple queries (by default exact matches against the type field).
Finally it will by default ensure that only unique ids are passed in.
This is nearly always what you want, but be careful.
In more detail, body ids are 64 bit integers (often called
bigint
s), which are commonly used as keys in databases.
However like many programming languages R does not have a native 64 bit
integer type. R’s default numeric format is double width floating point.
This can exactly handle numbers with up to 53 bits of precision.
However, the biggest integer (here data is represented in signed
format int64
) that we need to worry about is (2^63)-1 =
9,223,372,036,854,775,807. This cannot be represented as a numeric. I
have yet to spot a body id in this upper range, but the
neuprint_ids
function will take care that you do not use
one. If you need to specify a large bodyid as input to your function
then you should insist on
bit64::integer64()
The latter seems better in theory (since it is more compact) than
using character vectors, but it relies on an add-on package (bit64)
rather than base R and I have seen subtle errors when
integer64
objects lose their class. In particular you
cannot turn them into a list or run lapply
without losing
their class. Therefore I recommend using character vectors where
possible.
Bottom line
neuprint_ids()
to your incoming body ids
to enable a range of simple queries and check that your input looks
sensible.id2json()
on your body ids when building
your cypherneuprint_ids()
, then at least use
id2char
to put them into the most robust standard form (a
character vector).In order to make a query, you need to specify which neuPrint
server you want to talk to as well as the dataset you
would like to use. The server is specified by a
neuprint_connection()
object. 99% of the time you will not
need to do anything as this will be handled transparently by a one time
user setting to specify their preferred server and authentication
token.
The same is true of the dataset parameter, with the additional
feature that that neuprint_fetch_custom()
will internally
check for a default dataset for the current server (as defined by the
conn
connection object).
However your function must allow people to specify both of these things if they wish. Therefore your function should look something like this in outline:
myquery <- function(query, conn=NULL, dataset=NULL, ...) {
neuprint_fetch_custom(query, conn=conn, dataset=dataset, ...)
}
Internally neuprint_fetch_custom()
, which will look
after the dataset
argument, and then go on to call the low
level neuprint_fetch()
, which will ensure that the
connection object is valid (or use neuprint_login()
to make
one).
The sample function just ran unlist()
on the list
returned by neuprint_fetch_custom()
. This is fine in many
cases. You can also pass the simplifyVector
argument to
neuprint_fetch_custom()
and this will clean up many forms
of list. See jsonlite::fromJSON()
for details of how this
operates.
If you are getting a nested list specifying a table (i.e. a
data.frame
or spreadsheet like result) then there is a
function neuprint_list2df()
which will do a lot of the
heavy lifting. It is strongly recommended to make use of this whenever
possible. See neuprint_get_meta()
for an example. Note that
neuprint_list2df()
has an argument with default
stringsAsFactors=FALSE
to ensure that columns in the output
data.frame
will be character vectors unless you take steps
to make them factors.
Feel free to ask for help on the nat-user google group or by making an issue.