JanusGraph Deep Dive (Part 2): Demystify indexing

Composite Index

Composite Index Basics

Many will confuse the composite index concept with multi-column indexes (a.k.a. composite index) in MySQL. In MySQL and some other databases, a composite index is an index that covers multiple columns. In JanusGraph, however, a composite index could cover one or more columns and thus is totally different from MySQL’s concept. Please think of “composite index” as a “regular index”. We call it a “regular index” because it is just like an index you would create in a traditional database (with only equality lookup support). For example, if you create an index for the “name” vertex property:

// open a schema management session
mgmt = graph.openManagement()
// fetch the property key you want to index
name = mgmt.getPropertyKey(‘name’)
// build a composite index against this property
mgmt.buildIndex(‘byNameComposite’,Vertex.class).addKey(name).buildCompositeIndex()
// commit index changes
mgmt.commit()

Composite Index Internals

Composite indexes are stored in the same storage backend as your graph data (nodes and edges), although graph data are stored in the edgestore “store”, while composite indexes are stored in the graphindex “store”. Depending on your storage backend, “store” can mean different things. If you are using Cassandra as the storage backend, then your graph data will be stored in the edgestore table while composite indexes will be stored in the graphindex table in the same keyspace.

Composite Index Caveats

Note that composite indexes are in favor of high cardinality data (I try not to use the word “selectivity” because people often have opposite interpretations of it). For example, if you have hundreds of thousands of vertices with the same value for a property, then in general you should consider using a mixed index rather than a composite index. Since JanusGraph does not have an option to index labels, you might tend to create a property (let’s call it “type”) which essentially just replicates the label. Then you might naturally index the “type” property using a composite index. This works well when you don’t have a lot of vertices/edges with the same “type”, but when you have a million of vertices with the same “type”, you will likely encounter memory and/or performance problems. But why? Recall that composite indexes store your indexed values as partition keys. If you have one million vertices with the same “type”, then there will be one million rows with the same partition key. This is a big anti-pattern in most databases. In this case, you should consider using a mixed index.

Mixed Index

Mixed Index Basics

A mixed index is essentially an index stored in an external index backend that is different from your storage backend which stores your primary graph data. JanusGraph supports three index backends: Elasticsearch, Apache Solr and Apache Lucene. Apache Lucene is a Java library which can only be used locally where JanusGraph resides, and thus is not applicable when you have multiple JanusGraph instances. Elasticsearch and Apache Solr are distributed engines built on top of Apache Lucene, and thus are suitable for a distributed setup. Mixed index creation is very similar to composite index creation, and you can learn more in the official documentation. Of course, if you don’t need mixed index functionality, then you don’t need an external index backend at all.

Mixed Index Internals

Different from composite indexes, mixed indexes are stored in your external index backend. Therefore, the search features, performance and scalability all depend on your choice of index backend. In general, mixed indexes provide more flexibility than composite indexes and support additional condition predicates beyond equality.

Mixed Index Caveats

When hitting a composite index, you will likely find that a query such as the one below

g.V().has(“propKey”, “propValue”).toList()
g.V().has(“propKey”, “propValue”).range(0, 10000).toList()
g.V().has(“propKey”, “propValue”).range(10000, 20000).toList()
...
g.V().has(“propKey”, “propValue”).range(90000, 100000).toList()

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Boxuan Li

Boxuan Li

Maintainer of JanusGraph, a popular distributed graph database