Lightning-fast cluster computing with Spark and Cassandra
This library lets you expose Cassandra tables as Spark RDDs, write Spark RDDs to Cassandra tables, and execute arbitrary CQL queries in your Spark applications.
- Compatible with Apache Cassandra version 2.0 or higher and DataStax Enterprise 4.5
- Compatible with Apache Spark 1.0 and 1.1
- Exposes Cassandra tables as Spark RDDs
- Maps table rows to CassandraRow objects or tuples
- Offers customizable object mapper for mapping rows to objects of user-defined classes
- Saves RDDs back to Cassandra by implicit
- Converts data types between Cassandra and Scala
- Supports all Cassandra data types including collections
- Filters rows on the server side via the CQL
- Allows for execution of arbitrary CQL statements
- Plays nice with Cassandra Virtual Nodes
This project has been published to the Maven Central Repository. For SBT to download the connector binaries, sources and javadoc, put this in your project SBT config:
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.1.0"
If you want to access the functionality of Connector from Java, you may want to add also a Java API module:
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector-java" % "1.1.0"
In the root directory run
A fat jar will be generated to both of these directories:
Select the former for Scala apps, the later for Java.
In the root directory run:
sbt package sbt doc
The library package jars will be placed in:
The documentation will be generated to: