- MLeap 序列化
- 创建一个简单的 MLeap Pipeline
 - 序列化为 Zip 文件
- JSON 格式
 - Protobuf 格式
 
 - 序列化为目录
- JSON 格式
 - Protobuf 格式
 
 - 反序列化
- 反序列化 Zip Bundle
 - 反序列化目录 Bundle
 
 
 
MLeap 序列化
MLeap 中序列化和反序列化都非常简单。你可以选择序列化 MLeap Bundle 到文件系统中的一个目录,或者是序列化为一个 Zip 压缩包以便用于后期分发。
创建一个简单的 MLeap Pipeline
import ml.combust.bundle.BundleFileimport ml.combust.bundle.serializer.SerializationFormatimport ml.combust.mleap.core.feature.{OneHotEncoderModel, StringIndexerModel}import ml.combust.mleap.core.regression.LinearRegressionModelimport ml.combust.mleap.runtime.transformer.Pipelineimport ml.combust.mleap.runtime.transformer.feature.{OneHotEncoder, StringIndexer, VectorAssembler}import ml.combust.mleap.runtime.transformer.regression.LinearRegressionimport org.apache.spark.ml.linalg.Vectorsimport ml.combust.mleap.runtime.MleapSupport._import resource._// Create a sample pipeline that we will serialize// And then deserialize using various formatsval stringIndexer = StringIndexer(shape = NodeShape.scalar(inputCol = "a_string", outputCol = "a_string_index"),model = StringIndexerModel(Seq("Hello, MLeap!", "Another row")))val oneHotEncoder = OneHotEncoder(shape = NodeShape.vector(1, 2, inputCol = "a_string_index", outputCol = "a_string_oh"),model = OneHotEncoderModel(2, dropLast = false))val featureAssembler = VectorAssembler(shape = NodeShape().withInput("input0", "a_string_oh").withInput("input1", "a_double").withStandardOutput("features"),model = VectorAssemblerModel(Seq(TensorShape(2), ScalarShape())))val linearRegression = LinearRegression(shape = NodeShape.regression(3),model = LinearRegressionModel(Vectors.dense(2.0, 3.0, 6.0), 23.5))val pipeline = Pipeline(shape = NodeShape(),model = PipelineModel(Seq(stringIndexer, oneHotEncoder, featureAssembler, linearRegression)))
序列化为 Zip 文件
In order to serialize to a zip file, make sure the URI begins with jar:file and ends with a .zip.
为了序列化为 Zip 文件,需要确保 URL 以 jar:file 开头,以 .zip 结尾。
For example jar:file:/tmp/mleap-bundle.zip.
例如: jar:file:/tmp/mleap-bundle.zip。
JSON 格式
for(bundle <- managed(BundleFile("jar:file:/tmp/mleap-examples/simple-json.zip"))) {pipeline.writeBundle.format(SerializationFormat.Json).save(bundle)}
Protobuf 格式
for(bundle <- managed(BundleFile("jar:file:/tmp/mleap-examples/simple-protobuf.zip"))) {pipeline.writeBundle.format(SerializationFormat.Protobuf).save(bundle)}
序列化为目录
为了序列化为目录,需要确保 URL 以 file 开头。
例如: file:/tmp/mleap-bundle-dir
JSON 格式
for(bundle <- managed(BundleFile("file:/tmp/mleap-examples/simple-json-dir"))) {pipeline.writeBundle.format(SerializationFormat.Json).save(bundle)}
Protobuf 格式
for(bundle <- managed(BundleFile("file:/tmp/mleap-examples/simple-protobuf-dir"))) {pipeline.writeBundle.format(SerializationFormat.Protobuf).save(bundle)}
反序列化
反序列化和序列化一样简单,你无需事先知道 MLeap Bundle 的序列化格式,唯一需要了解的,是这个包的路径。
反序列化 Zip Bundle
// Deserialize a zip bundle// Use Scala ARM to make sure resources are managed properlyval zipBundle = (for(bundle <- managed(BundleFile("jar:file:/tmp/mleap-examples/simple-json.zip"))) yield {bundle.loadMleapBundle().get}).opt.get
反序列化目录 Bundle
// Deserialize a directory bundle// Use Scala ARM to make sure resources are managed properlyval dirBundle = (for(bundle <- managed(BundleFile("file:/tmp/mleap-examples/simple-json-dir"))) yield {bundle.loadMleapBundle().get}).opt.get
