日本免费全黄少妇一区二区三区-高清无码一区二区三区四区-欧美中文字幕日韩在线观看-国产福利诱惑在线网站-国产中文字幕一区在线-亚洲欧美精品日韩一区-久久国产精品国产精品国产-国产精久久久久久一区二区三区-欧美亚洲国产精品久久久久

Spark使用OSS Select加速數(shù)據(jù)查詢

本文介紹如何配置Spark使用OSS Select加速數(shù)據(jù)查詢,以及使用OSS Select查詢數(shù)據(jù)的優(yōu)勢 。
背景信息本文所有操作基于 Apache Impala(CDH6) 處理OSS數(shù)據(jù)搭建的CDH6集群及配置 。
說明 文中所有${}的內(nèi)容為環(huán)境變量,請根據(jù)您實(shí)際的環(huán)境修改 。
步驟一:配置Spark支持讀寫OSS由于Spark默認(rèn)沒有將OSS的支持包放到它的CLASSPATH里面,所以我們需要配置Spark支持讀寫OSS 。您需要在所有的CDH節(jié)點(diǎn)執(zhí)行以下操作:

  1. 進(jìn)入到${CDH_HOME}/lib/spark目錄,執(zhí)行如下命令:
     
    [root@cdh-master spark]# cd jars/
    [root@cdh-master jars]# ln -s ../../../jars/hadoop-aliyun-3.0.0-cdh6.0.1.jar hadoop-aliyun.jar
    [root@cdh-master jars]# ln -s ../../../jars/aliyun-sdk-oss-2.8.3.jar aliyun-sdk-oss-2.8.3.jar
    [root@cdh-master jars]# ln -s ../../../jars/jdom-1.1.jar jdom-1.1.jar
  2. 進(jìn)入到${CDH_HOME}/lib/spark目錄,運(yùn)行一個(gè)查詢 。
     
    [root@cdh-master spark]# ./bin/spark-shell
    WARNING: User-defined SPARK_HOME (/opt/cloudera/parcels/CDH-6.0.1-1.cdh6.0.1.p0.590678/lib/spark) overrides detected (/opt/cloudera/parcels/CDH/lib/spark).
    WARNING: Running spark-class from user-defined location.
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    Spark context Web UI available at http://x.x.x.x:4040
    Spark context available as 'sc' (master = yarn, app id = application_1540878848110_0004).
    Spark session available as'spark'.
    Welcome to
    ______
    / __/_____ _____/ /__
    _ / _ / _ `/ __/'_/
    /___/ .__/_,_/_/ /_/_version 2.2.0-cdh6.0.1
    /_/

    Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_152)
    Type in expressions to have them evaluated.
    Type :help for more information.

    scala> val myfile = sc.textFile("oss://{your-bucket-name}/50/store_sales")
    myfile: org.apache.spark.rdd.RDD[String] = oss://{your-bucket-name}/50/store_sales MapPartitionsRDD[1] at textFile at <console>:24

    scala> myfile.count()
    res0: Long = 144004764

    scala> myfile.map(line => line.split('|')).filter(_(0).toInt >= 2451262).take(3)
    res15: Array[Array[String]] = Array(Array(2451262, 71079, 20359, 154660, 284233, 6206, 150579, 46, 512, 2160001, 84, 6.94, 11.38, 9.33, 681.83, 783.72, 582.96, 955.92, 5.09, 681.83, 101.89, 106.98, -481.07), Array(2451262, 71079, 26863, 154660, 284233, 6206, 150579, 46, 345, 2160001, 12, 67.82, 115.29, 25.36, 0.00, 304.32, 813.84, 1383.48, 21.30, 0.00, 304.32, 325.62, -509.52), Array(2451262, 71079, 55852, 154660, 284233, 6206, 150579, 46, 243, 2160001, 74, 32.41, 34.67, 1.38, 0.00, 102.12, 2398.34, 2565.58, 4.08, 0.00, 102.12, 106.20, -2296.22))

    scala> myfile.map(line => line.split('|')).filter(_(0) >= "2451262").saveAsTextFile("oss://{your-bucket-name}/spark-oss-test.1")可正常執(zhí)行查詢,則部署生效 。
步驟二:配置Spark支持OSS SelectOSS Select詳情請參見OSS Select,以下內(nèi)容基于oss-cn-shenzhen.aliyuncs.com這個(gè)OSS EndPoint來進(jìn)行 。您需要在所有的CDH節(jié)點(diǎn)執(zhí)行以下操作:
  1. 下載 OSS Select的Spark支持包到${CDH_HOME}/jars目錄下(目前該支持包還在測試中) 。
  2. 解壓支持包 。
     
    [root@cdh-master jars]# tar -tvf spark-2.2.0-oss-select-0.1.0-SNAPSHOT.tar.gz
    drwxr-xr-x root/root 0 2018-10-30 17:59 spark-2.2.0-oss-select-0.1.0-SNAPSHOT/
    -rw-r--r-- root/root 26514 2018-10-30 16:11 spark-2.2.0-oss-select-0.1.0-SNAPSHOT/stax-api-1.0.1.jar
    -rw-r--r-- root/root 547584 2018-10-30 16:11 spark-2.2.0-oss-select-0.1.0-SNAPSHOT/aliyun-sdk-oss-3.3.0.jar
    -rw-r--r-- root/root 13277 2018-10-30 16:11 spark-2.2.0-oss-select-0.1.0-SNAPSHOT/aliyun-java-sdk-sts-3.0.0.jar
    -rw-r--r-- root/root 116337 2018-10-30 16:11 spark-2.2.0-oss-select-0.1.0-SNAPSHOT/aliyun-java-sdk-core-3.4.0.jar
    -rw-r--r-- root/root 215492 2018-10-30 16:11 spark-2.2.0-oss-select-0.1.0-SNAPSHOT/aliyun-java-sdk-ram-3.0.0.jar

    推薦閱讀