我在AWS中创建了一个3节点(1主,2工) Apache Spark群集。 我能够从主服务器向集群提交作业,但是我无法远程工作。
/* SimpleApp.scala */ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf object SimpleApp { def main(args: Array[String]) { val logFile = "/usr/local/spark/README.md" // Should be some file on your system val conf = new SparkConf().setAppName("Simple Application").setMaster("spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077") val sc = new SparkContext(conf) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line => line.contains("a")).count() val numBs = logData.filter(line => line.contains("b")).count() println(s"Lines with a: $numAs, Lines with b: $numBs") sc.stop() } }我可以从主人那里看到:
Spark Master at spark://ip-171-13-22-125.ec2.internal:7077 URL: spark://ip-171-13-22-125.ec2.internal:7077 REST URL: spark://ip-171-13-22-125.ec2.internal:6066 (cluster mode)所以当我从本地机器执行SimpleApp.scala时,它无法连接到Spark Master :
2017-02-04 19:59:44,074 INFO [appclient-register-master-threadpool-0] client.StandaloneAppClient$ClientEndpoint (Logging.scala:54) [] - Connecting to master spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077... 2017-02-04 19:59:44,166 WARN [appclient-register-master-threadpool-0] client.StandaloneAppClient$ClientEndpoint (Logging.scala:87) [] - Failed to connect to spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077 org.apache.spark.SparkException: Exception thrown in awaitResult at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) ~[spark-core_2.10-2.0.2.jar:2.0.2] at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) ~[spark-core_2.10-2.0.2.jar:2.0.2] at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) ~[scala-library-2.10.0.jar:?] at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) ~[spark-core_2.10-2.0.2.jar:2.0.2]但是,我知道如果将主设备设置为local ,它会起作用,因为它会在本地运行。 但是,我想让我的客户端连接到此远程主服务器。 我怎么能做到这一点? Apache配置文件。 我甚至可以远程登录到该公共DNS和端口,我还为每个EC2实例配置了公共DNS和主机名/etc/hosts 。 我希望能够向远程主人提交作业,我错过了什么?
I created a 3 node (1 master, 2 workers) Apache Spark cluster in AWS. I'm able to submit jobs to the cluster from the master, however I cannot get it work remotely.
/* SimpleApp.scala */ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf object SimpleApp { def main(args: Array[String]) { val logFile = "/usr/local/spark/README.md" // Should be some file on your system val conf = new SparkConf().setAppName("Simple Application").setMaster("spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077") val sc = new SparkContext(conf) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line => line.contains("a")).count() val numBs = logData.filter(line => line.contains("b")).count() println(s"Lines with a: $numAs, Lines with b: $numBs") sc.stop() } }I can see from the master:
Spark Master at spark://ip-171-13-22-125.ec2.internal:7077 URL: spark://ip-171-13-22-125.ec2.internal:7077 REST URL: spark://ip-171-13-22-125.ec2.internal:6066 (cluster mode)so when I execute SimpleApp.scala from my local machine, it fails to connect to the the Spark Master:
2017-02-04 19:59:44,074 INFO [appclient-register-master-threadpool-0] client.StandaloneAppClient$ClientEndpoint (Logging.scala:54) [] - Connecting to master spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077... 2017-02-04 19:59:44,166 WARN [appclient-register-master-threadpool-0] client.StandaloneAppClient$ClientEndpoint (Logging.scala:87) [] - Failed to connect to spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077 org.apache.spark.SparkException: Exception thrown in awaitResult at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) ~[spark-core_2.10-2.0.2.jar:2.0.2] at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) ~[spark-core_2.10-2.0.2.jar:2.0.2] at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) ~[scala-library-2.10.0.jar:?] at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) ~[spark-core_2.10-2.0.2.jar:2.0.2]However, I know it would have worked if I had set the master to local, because then it would run locally. However, I want to have my client connecting to this remote master. How can I accomplish that? The Apache configuration looks file. I can even telnet to that public DNS and port, I also configured /etc/hosts with the public DNS and hostname for each of the EC2 instances. I want to be able to submit jobs to this remote master, what am I missing?
最满意答案
对于绑定主机主机名/ IP,请转到您的spark安装conf目录(spark-2.0.2-bin-hadoop2.7 / conf),并使用以下命令创建spark-env.sh文件。
cp spark-env.sh.template spark-env.sh在vi编辑器中打开spark-env.sh文件,并在主机的主机名/ IP下面添加行。
SPARK_MASTER_HOST=ec2-54-245-111-320.compute-1.amazonaws.com使用stop-all.sh和start-all.sh停止并启动Spark。 现在您可以使用它来连接远程主设备
val spark = SparkSession.builder() .appName("SparkSample") .master("spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077") .getOrCreate()有关设置环境变量的更多信息,请查看http://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts
For binding master host-name/IP go to your spark installation conf directory (spark-2.0.2-bin-hadoop2.7/conf) and create spark-env.sh file using below command.
cp spark-env.sh.template spark-env.shOpen spark-env.sh file in vi editor and add below line with host-name/IP of your master.
SPARK_MASTER_HOST=ec2-54-245-111-320.compute-1.amazonaws.comStop and start Spark using stop-all.sh and start-all.sh. Now you can use it to connect remote master using
val spark = SparkSession.builder() .appName("SparkSample") .master("spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077") .getOrCreate()For more information on setting environment variables please check http://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts
连接到远程Spark主机 - Java / Scala(Connecting to a remote Spark master - Java / Scala)我在AWS中创建了一个3节点(1主,2工) Apache Spark群集。 我能够从主服务器向集群提交作业,但是我无法远程工作。
/* SimpleApp.scala */ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf object SimpleApp { def main(args: Array[String]) { val logFile = "/usr/local/spark/README.md" // Should be some file on your system val conf = new SparkConf().setAppName("Simple Application").setMaster("spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077") val sc = new SparkContext(conf) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line => line.contains("a")).count() val numBs = logData.filter(line => line.contains("b")).count() println(s"Lines with a: $numAs, Lines with b: $numBs") sc.stop() } }我可以从主人那里看到:
Spark Master at spark://ip-171-13-22-125.ec2.internal:7077 URL: spark://ip-171-13-22-125.ec2.internal:7077 REST URL: spark://ip-171-13-22-125.ec2.internal:6066 (cluster mode)所以当我从本地机器执行SimpleApp.scala时,它无法连接到Spark Master :
2017-02-04 19:59:44,074 INFO [appclient-register-master-threadpool-0] client.StandaloneAppClient$ClientEndpoint (Logging.scala:54) [] - Connecting to master spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077... 2017-02-04 19:59:44,166 WARN [appclient-register-master-threadpool-0] client.StandaloneAppClient$ClientEndpoint (Logging.scala:87) [] - Failed to connect to spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077 org.apache.spark.SparkException: Exception thrown in awaitResult at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) ~[spark-core_2.10-2.0.2.jar:2.0.2] at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) ~[spark-core_2.10-2.0.2.jar:2.0.2] at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) ~[scala-library-2.10.0.jar:?] at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) ~[spark-core_2.10-2.0.2.jar:2.0.2]但是,我知道如果将主设备设置为local ,它会起作用,因为它会在本地运行。 但是,我想让我的客户端连接到此远程主服务器。 我怎么能做到这一点? Apache配置文件。 我甚至可以远程登录到该公共DNS和端口,我还为每个EC2实例配置了公共DNS和主机名/etc/hosts 。 我希望能够向远程主人提交作业,我错过了什么?
I created a 3 node (1 master, 2 workers) Apache Spark cluster in AWS. I'm able to submit jobs to the cluster from the master, however I cannot get it work remotely.
/* SimpleApp.scala */ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf object SimpleApp { def main(args: Array[String]) { val logFile = "/usr/local/spark/README.md" // Should be some file on your system val conf = new SparkConf().setAppName("Simple Application").setMaster("spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077") val sc = new SparkContext(conf) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line => line.contains("a")).count() val numBs = logData.filter(line => line.contains("b")).count() println(s"Lines with a: $numAs, Lines with b: $numBs") sc.stop() } }I can see from the master:
Spark Master at spark://ip-171-13-22-125.ec2.internal:7077 URL: spark://ip-171-13-22-125.ec2.internal:7077 REST URL: spark://ip-171-13-22-125.ec2.internal:6066 (cluster mode)so when I execute SimpleApp.scala from my local machine, it fails to connect to the the Spark Master:
2017-02-04 19:59:44,074 INFO [appclient-register-master-threadpool-0] client.StandaloneAppClient$ClientEndpoint (Logging.scala:54) [] - Connecting to master spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077... 2017-02-04 19:59:44,166 WARN [appclient-register-master-threadpool-0] client.StandaloneAppClient$ClientEndpoint (Logging.scala:87) [] - Failed to connect to spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077 org.apache.spark.SparkException: Exception thrown in awaitResult at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) ~[spark-core_2.10-2.0.2.jar:2.0.2] at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) ~[spark-core_2.10-2.0.2.jar:2.0.2] at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) ~[scala-library-2.10.0.jar:?] at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) ~[spark-core_2.10-2.0.2.jar:2.0.2]However, I know it would have worked if I had set the master to local, because then it would run locally. However, I want to have my client connecting to this remote master. How can I accomplish that? The Apache configuration looks file. I can even telnet to that public DNS and port, I also configured /etc/hosts with the public DNS and hostname for each of the EC2 instances. I want to be able to submit jobs to this remote master, what am I missing?
最满意答案
对于绑定主机主机名/ IP,请转到您的spark安装conf目录(spark-2.0.2-bin-hadoop2.7 / conf),并使用以下命令创建spark-env.sh文件。
cp spark-env.sh.template spark-env.sh在vi编辑器中打开spark-env.sh文件,并在主机的主机名/ IP下面添加行。
SPARK_MASTER_HOST=ec2-54-245-111-320.compute-1.amazonaws.com使用stop-all.sh和start-all.sh停止并启动Spark。 现在您可以使用它来连接远程主设备
val spark = SparkSession.builder() .appName("SparkSample") .master("spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077") .getOrCreate()有关设置环境变量的更多信息,请查看http://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts
For binding master host-name/IP go to your spark installation conf directory (spark-2.0.2-bin-hadoop2.7/conf) and create spark-env.sh file using below command.
cp spark-env.sh.template spark-env.shOpen spark-env.sh file in vi editor and add below line with host-name/IP of your master.
SPARK_MASTER_HOST=ec2-54-245-111-320.compute-1.amazonaws.comStop and start Spark using stop-all.sh and start-all.sh. Now you can use it to connect remote master using
val spark = SparkSession.builder() .appName("SparkSample") .master("spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077") .getOrCreate()For more information on setting environment variables please check http://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts
发布评论