Correct posture of spark combined with GitHub (pull request) pr

Time:2020-11-25

Recently, when upgrading the internal spark version, it involves merging PR on GitHub. Specifically, when spark 2. X is upgraded to spark 3.0.1, it is compatible with HDFS cdh-2.6.0-5.13.1, and a compilation error is reported

[INFO] Compiling 25 Scala sources to /Users/libinsong/Documents/codes/tongdun/spark-3.0/resource-managers/yarn/target/scala-2.12/classes ...
[ERROR] [Error] resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:298: value setRolledLogsIncludePattern is not a member of org.apache.hadoop.yarn.api.records.LogAggregationContext
[ERROR] [Error] resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:300: value setRolledLogsExcludePattern is not a member of org.apache.hadoop.yarn.api.records.LogAggregationContext
[ERROR] [Error] resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:551: not found: value isLocalUri
[ERROR] [Error] resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:1367: not found: value isLocalUri
[ERROR] four errors found

Specific solutionsgithub spark prThe solution has been given. Just modify the corresponding code, but we can choose a more elegant way (in Git cherry pick mode) just to modify it,
Now let’s share it briefly

Find it directlysetRolledLogsIncludePatterna line,

 sparkConf.get(ROLLED_LOG_INCLUDE_PATTERN).foreach { includePattern =>
      try {
        val logAggregationContext = Records.newRecord(classOf[LogAggregationContext])
        logAggregationContext.setRolledLogsIncludePattern(includePattern)
        sparkConf.get(ROLLED_LOG_EXCLUDE_PATTERN).foreach { excludePattern =>
          logAggregationContext.setRolledLogsExcludePattern(excludePattern)
        }
        appContext.setLogAggregationContext(logAggregationContext)
      } catch {
        case NonFatal(e) =>
          logWarning(s"Ignoring ${ROLLED_LOG_INCLUDE_PATTERN.key} because the version of YARN " +
            "does not support it", e)
      }
    }
    appContext.setUnmanagedAM(isClientUnmanagedAMEnabled)

    sparkConf.get(APPLICATION_PRIORITY).foreach { appPriority =>
      appContext.setPriority(Priority.newInstance(appPriority))
    }
    appContext
  }

It is found that the code on the master is not what we want. At this time, we can use git Blume. On GitHub, the
Correct posture of spark combined with GitHub (pull request) pr

In this way, we can find that the code has been modified many times, and find the corresponding spark-19545 fix compile issue for spark on yarn when building Click in
Correct posture of spark combined with GitHub (pull request) pr
Find the corresponding commitid
Correct posture of spark combined with GitHub (pull request) pr

Execute git cherry pick 8e8afb3a3468aa743d13e23e10e77e94b772b2ed to append the commit to your working directory
In this way, it can not only save the original commit information, but also save the original commit information for tracking