日本免费全黄少妇一区二区三区-高清无码一区二区三区四区-欧美中文字幕日韩在线观看-国产福利诱惑在线网站-国产中文字幕一区在线-亚洲欧美精品日韩一区-久久国产精品国产精品国产-国产精久久久久久一区二区三区-欧美亚洲国产精品久久久久

Spark使用OSS Select加速數(shù)據(jù)查詢( 三 )


sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@4bdef487

scala> sqlContext.sql("CREATE TEMPORARY VIEW people USING com.aliyun.oss " +
|"OPTIONS (" +
|"oss.bucket 'select-test-sz', " +
|"oss.prefix 'people', " + // objects with this prefix belong to this table
|"oss.schema 'name string, company string, age long'," + // like 'column_a long, column_b string'
|"oss.data.format 'csv'," + // we only support csv now
|"oss.input.csv.header 'None'," +
|"oss.input.csv.recordDelimiter 'rn'," +
|"oss.input.csv.fieldDelimiter ','," +
|"oss.input.csv.commentChar '#'," +
|"oss.input.csv.quoteChar '"'," +
|"oss.output.csv.recordDelimiter 'n'," +
|"oss.output.csv.fieldDelimiter ','," +
|"oss.output.csv.quoteChar '"'," +
|"oss.endpoint 'oss-cn-shenzhen.aliyuncs.com', " +
|"oss.accessKeyId 'Your Access Key Id', " +
|"oss.accessKeySecret 'Your Access Key Secret')")
res0: org.apache.spark.sql.DataFrame = []

scala>val sql: String = "select count(*) from people where name like 'Lora%'"
sql: String = select count(*) from people where name like 'Lora%'

scala>sqlContext.sql(sql).show()
+--------+
|count(1)|
+--------+
|31770|
+--------+

scala> val textFile = sc.textFile("oss://select-test-sz/people/")
textFile: org.apache.spark.rdd.RDD[String] = oss://select-test-sz/people/ MapPartitionsRDD[8] at textFile at <console>:24

scala> textFile.map(line => line.split(',')).filter(_(0).startsWith("Lora")).count()
res3: Long = 31770
從下圖可看到:使用OSS Select查詢數(shù)據(jù)耗時(shí)為15s,不使用OSS Select查詢數(shù)據(jù)耗時(shí)為54s,使用OSS Select能大幅度加快查詢速度 。

Spark使用OSS Select加速數(shù)據(jù)查詢


Spark對(duì)接OSS Select支持包的實(shí)現(xiàn)(Preview)通過擴(kuò)展Spark的 DataSource API 可以實(shí)現(xiàn)Spark對(duì)接OSS Select 。通過實(shí)現(xiàn)PrunedFilteredScan,可以把需要的列和過濾條件下推到OSS Select執(zhí)行 。目前這個(gè)支持包還在開發(fā)中,定義的規(guī)范和支持的過濾條件如下: