Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR] improvement(spark-client): put sparkConf as extra properties while client request accessCluster #2254

Merged
merged 7 commits into from
Nov 26, 2024

Conversation

maobaolong
Copy link
Member

What changes were proposed in this pull request?

put sparkConf as extra properties while client request accessCluster

Why are the changes needed?

Coordinator can let the spark application access rss cluster or not by some customized access checker leverage some of the spark configs.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

No need

@maobaolong maobaolong force-pushed the putSparkConfWhileAccessCluster branch from e7e5f4f to 827ee04 Compare November 18, 2024 08:44
Copy link

github-actions bot commented Nov 18, 2024

Test Results

 2 943 files  + 3   2 943 suites  +3   6h 18m 14s ⏱️ + 7m 42s
 1 096 tests + 4   1 093 ✅ + 5   2 💤 ±0  1 ❌ ±0 
13 682 runs  +19  13 651 ✅ +20  30 💤 ±0  1 ❌ ±0 

For more details on these failures, see this check.

Results for commit 6c16c8c. ± Comparison against base commit 54611f3.

♻️ This comment has been updated with latest results.

zuston
zuston previously approved these changes Nov 19, 2024
Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Although putting all confs into the general properites looks a little bit strange

@@ -123,6 +123,7 @@ private boolean tryAccessCluster() {
Map<String, String> extraProperties = Maps.newHashMap();
extraProperties.put(
ACCESS_INFO_REQUIRED_SHUFFLE_NODES_NUM, String.valueOf(assignmentShuffleNodesNum));
extraProperties.putAll(RssSparkConfig.sparkConfToMap(sparkConf));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't put Spark configurations to coordinator. Coordinator isn't related to compute framework. The spark configurations is coupled with Spark framework. You can have an extractor instead of passing them directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is related to #2255 , I want to determine whether to ban an app by a specific ban id which extract from one of the spark client config, but I don't want client config it concretely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether it is good design to have a ban id list. It seems that it isn't common enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not common if bannedIdProvider contains spark.

@maobaolong
Copy link
Member Author

@jerqi Thanks for your suggestion, removed the spark. prefix to be more generic by new commit

// Put all spark conf into extra properties, except which length is longer than 100
// to avoid extra properties too long.
RssSparkConfig.toRssConf(sparkConf).getAll().stream()
.filter(entry -> StringUtils.length((String) entry.getValue()) < 100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too tricky....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Em... Thanks for your review! I'm sorry for this.

I would greatly appreciate it if you could give me some advice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can implement a white list or configuration filter. The white list can be configured in the configuration. The configuration filter could have length configuration filter. The length could be configured in the configuration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think use black list instead of white list can be possible?

Copy link
Contributor

@jerqi jerqi Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for me, too. black list would better to be replaced by excludeList.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@maobaolong
Copy link
Member Author

@jerqi Exclude config added, PTAL


RssConf rssConf = RssSparkConfig.toRssConf(sparkConf);
List<String> excludeProperties =
rssConf.get(RssClientConf.RSS_CLIENT_REPORT_EXCLUDE_PROPERTIES);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is null? Will it cause an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your tip, I set a default value.

@maobaolong maobaolong force-pushed the putSparkConfWhileAccessCluster branch from 746adcf to 9b01609 Compare November 25, 2024 08:24
@@ -307,6 +307,6 @@ public class RssClientConf {
ConfigOptions.key("rss.client.reportExcludeProperties")
.stringType()
.asList()
.noDefaultValue()
.defaultValues("hdfs.inputPaths")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Em... this is a weird value.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 53.84615% with 6 lines in your changes missing coverage. Please review.

Project coverage is 52.36%. Comparing base (477bc30) to head (9a3b25d).
Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
.../java/org/apache/spark/shuffle/RssSparkConfig.java 20.00% 4 Missing ⚠️
...uniffle/shuffle/manager/RssShuffleManagerBase.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2254      +/-   ##
============================================
- Coverage     52.65%   52.36%   -0.29%     
+ Complexity     3376     2969     -407     
============================================
  Files           514      466      -48     
  Lines         27565    22326    -5239     
  Branches       2582     2044     -538     
============================================
- Hits          14513    11691    -2822     
+ Misses        12092     9890    -2202     
+ Partials        960      745     -215     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

// Put all spark conf into extra properties, except which length is longer than 100
// to avoid extra properties too long.
rssConf.getAll().stream()
.filter(entry -> StringUtils.length((String) entry.getValue()) < 100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you remove <100?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

RssConf rssConf = RssSparkConfig.toRssConf(sparkConf);
List<String> excludeProperties =
rssConf.get(RssClientConf.RSS_CLIENT_REPORT_EXCLUDE_PROPERTIES);
// Put all spark conf into extra properties, except which length is longer than 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you remove the comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -303,4 +303,10 @@ public class RssClientConf {
.withDescription(
"The block id manager class of server for this application, "
+ "the implementation of this interface to manage the shuffle block ids");
public static final ConfigOption<List<String>> RSS_CLIENT_REPORT_EXCLUDE_PROPERTIES =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a blank line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@maobaolong
Copy link
Member Author

@jerqi Thanks for your review! merge it

@maobaolong maobaolong merged commit 204e4e4 into apache:master Nov 26, 2024
42 of 43 checks passed
@jerqi
Copy link
Contributor

jerqi commented Nov 26, 2024

We forgot documents.

@maobaolong
Copy link
Member Author

@jerqi sorry for that, will add it this week.

jerqi pushed a commit that referenced this pull request Dec 4, 2024
…while client request accessCluster (#2254)

### What changes were proposed in this pull request?

put sparkConf as extra properties while client request accessCluster

### Why are the changes needed?

Coordinator can let the spark application access rss cluster or not by some customized access checker leverage some of the spark configs.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No need
maobaolong added a commit that referenced this pull request Jan 4, 2025
…port extraProperties (#2265)

### What changes were proposed in this pull request?

- Add include key filter before report extraProperties
- Related docs

### Why are the changes needed?

- Add include key filter
- Add documents.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested on our test cluster.

- Client conf
```
--conf spark.shuffle.manager=org.apache.spark.shuffle.DelegationRssShuffleManager --conf spark.rss.access.id="access.id"
--conf spark.rss.client.reportIncludeProperties="spark.test.key,test2.key,app.id,yarn.queue" --conf spark.rss.client.reportExcludeProperties="app.id"
```

- Coordinator_rpc_audit.log

```
[2024-11-26 21:14:45.632] cmd=accessCluster	statusCode=ACCESS_DENIED	from=/XXX:34733	executionTimeUs=276(<1s)	appId=N/A	args{accessInfo=AccessInfo{accessId='rss-test-mbl-003-bannedid', user= tdwadmin, tags=[ss_v5], extraProperties={access_info_required_shuffle_nodes_num=-1, yarn.queue=g_teg_tdw_operlog-offline, spark.test.key=123_456}}}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants