ArangoDB Spark Connector - Java Reference
This library has been deprecated in favor of the new ArangoDB Datasource for Apache Spark.
ArangoSpark.save
ArangoSpark.save[T](rdd: JavaRDD[T], collection: String, options: WriteOptions)
ArangoSpark.save[T](dataset: Dataset[T], collection: String, options: WriteOptions)
Save data from rdd into ArangoDB
Arguments
-
rdd:
JavaRDD[T]The rdd with the data to save
-
collection:
StringThe collection to save in
-
options:
WriteOptions-
database:
StringDatabase to write into
-
hosts:
StringAlternative hosts to context property
arangodb.hosts -
user:
StringAlternative user to context property
arangodb.user -
password:
StringAlternative password to context property
arangodb.password -
useSsl:
BooleanAlternative useSsl to context property
arangodb.useSsl -
sslKeyStoreFile:
StringAlternative sslKeyStoreFile to context property
arangodb.ssl.keyStoreFile -
sslPassPhrase:
StringAlternative sslPassPhrase to context property
arangodb.ssl.passPhrase -
sslProtocol:
StringAlternative sslProtocol to context property
arangodb.ssl.protocol -
method:
WriteOptions.MethodWrite method to use, it can be one of:
WriteOptions.INSERT$.MODULE$WriteOptions.UPDATE$.MODULE$WriteOptions.REPLACE$.MODULE$
-
Examples
JavaSparkContext sc = ...
List<MyBean> docs = ...
JavaRDD<MyBean> documents = sc.parallelize(docs);
ArangoSpark.save(documents, "myCollection", new WriteOptions().database("myDB"));
Very Large Datasets
To prevent errors on very large datasets (over one million objects) use “repartition” for smaller chunks:
ArangoSpark.save(allEdges.toJSON.repartition(20000), collection = "mio_edges", options = writeOptions)
ArangoSpark.saveDF
ArangoSpark.saveDF(dataframe: DataFrame, collection: String, options: WriteOptions)
Save data from dataframe into ArangoDB
Arguments
-
dataframe: DataFrame`
The dataFrame with the data to save
-
collection:
StringThe collection to save in
-
options:
WriteOptions-
database:
StringDatabase to write into
-
hosts:
StringAlternative hosts to context property
arangodb.hosts -
user:
StringAlternative user to context property
arangodb.user -
password:
StringAlternative password to context property
arangodb.password -
useSsl:
BooleanAlternative useSsl to context property
arangodb.useSsl -
sslKeyStoreFile:
StringAlternative sslKeyStoreFile to context property
arangodb.ssl.keyStoreFile -
sslPassPhrase:
StringAlternative sslPassPhrase to context property
arangodb.ssl.passPhrase -
sslProtocol:
StringAlternative sslProtocol to context property
arangodb.ssl.protocol -
method:
WriteOptions.MethodWrite method to use, it can be one of:
WriteOptions.INSERT$.MODULE$WriteOptions.UPDATE$.MODULE$WriteOptions.REPLACE$.MODULE$
-
Examples
JavaSparkContext sc = ...
List<MyBean> docs = ...
JavaRDD<MyBean> documents = sc.parallelize(docs);
SQLContext sql = SQLContext.getOrCreate(sc);
DataFrame df = sql.createDataFrame(documents, MyBean.class);
ArangoSpark.saveDF(documents, "myCollection", new WriteOptions().database("myDB"));
ArangoSpark.load
ArangoSparkload[T](sparkContext: JavaSparkContext, collection: String, options: ReadOptions, clazz: Class[T]): ArangoJavaRDD[T]
Load data from ArangoDB into rdd
Arguments
-
sparkContext:
JavaSparkContextThe sparkContext containing the ArangoDB configuration
-
collection:
StringThe collection to load data from
-
options:
ReadOptions-
database:
StringDatabase to write into
-
hosts:
StringAlternative hosts to context property
arangodb.hosts -
user:
StringAlternative user to context property
arangodb.user -
password:
StringAlternative password to context property
arangodb.password -
useSsl:
BooleanAlternative useSsl to context property
arangodb.useSsl -
sslKeyStoreFile:
StringAlternative sslKeyStoreFile to context property
arangodb.ssl.keyStoreFile -
sslPassPhrase:
StringAlternative sslPassPhrase to context property
arangodb.ssl.passPhrase -
sslProtocol:
StringAlternative sslProtocol to context property
arangodb.ssl.protocol
-
-
clazz:
Class[T]The type of the document
Examples
JavaSparkContext sc = ...
ArangoJavaRDD<MyBean> rdd = ArangoSpark.load(sc, "myCollection", new ReadOptions().database("myDB"), MyBean.class);
ArangoRDD.filter
ArangoJavaRDD.filter(condition: String): ArangoJavaRDD[T]
Adds a filter condition. If used multiple times, the conditions will be combined with a logical AND.
Arguments
-
condition:
StringThe condition for the filter statement. Use
docinside to reference the document. e.g."doc.name == 'John'"
Examples
JavaSparkContext sc = ...
ArangoJavaRDD<MyBean> rdd = ArangoSpark.load(sc, "myCollection", new ReadOptions().database("myDB"), MyBean.class);
ArangoJavaRDD<MyBean> rddFiltered = rdd.filter("doc.test <= 50");
Spark Streaming Integration
RDDs can also be saved to ArangoDB from Spark Streaming using ArangoSpark.save().
Example
javaDStream.foreachRDD(rdd ->
ArangoSpark.save(rdd, COLLECTION, new WriteOptions().database(DB)));