Slowdown for cluster load 10% to 190% #12

AluriJaganMohini · 2020-12-29T08:26:03Z

Dear Hongzi,

I am trying to reproduce all the results that you reported on paper. From the source code, it is unclear to plot the slowdown from 10% to 190% cluster load. When I run the run_script.py, I am able to see the generated logs and nothing corresponds to the Figure 4.

Can you please give a detailed explanation on how you are plotting the slowdown for cluster load from 10% to 190%. Again from the source code, It is clear that you are relying on the job rate from 0.1 to 1.0 to vary the load from 10% to 190%, but when I tried to rely on just the job rate from 0.1 to 1.0 , and varied the cluster load from 10 to 190%, the slowdown after 100% was constant till 190%.

It would be great, if you also say a few words about how the load is varied from 10% to 190%.
and can you please tell me how to reproduce this figure 4 or how I can move forward with the logs generated to achieve Figure 4?.

Thank You.

hongzimao · 2021-01-09T15:48:05Z

Thanks for the detailed question and sorry for the late reply. Load larger than 100% has to be ephemeral. Two things can affect the system load, the interval between the new jobs as you pointed out, and the new job size distribution.

deeprm/job_distribution.py

Lines 25 to 59 in b42eff0

    
           def normal_dist(self): 
        
               # new work duration 
        
               nw_len = np.random.randint(1, self.job_len + 1)  # same length in every dimension 
        
               nw_size = np.zeros(self.num_res) 
        
               for i in range(self.num_res): 
        
                   nw_size[i] = np.random.randint(1, self.max_nw_size + 1) 
        
               return nw_len, nw_size 
        
           def bi_model_dist(self): 
        
               # -- job length -- 
        
               if np.random.rand() < self.job_small_chance:  # small job 
        
                   nw_len = np.random.randint(self.job_len_small_lower, 
        
                                              self.job_len_small_upper + 1) 
        
               else:  # big job 
        
                   nw_len = np.random.randint(self.job_len_big_lower, 
        
                                              self.job_len_big_upper + 1) 
        
               nw_size = np.zeros(self.num_res) 
        
               # -- job resource request -- 
        
               dominant_res = np.random.randint(0, self.num_res) 
        
               for i in range(self.num_res): 
        
                   if i == dominant_res: 
        
                       nw_size[i] = np.random.randint(self.dominant_res_lower, 
        
                                                      self.dominant_res_upper + 1) 
        
                   else: 
        
                       nw_size[i] = np.random.randint(self.other_res_lower, 
        
                                                      self.other_res_upper + 1) 
        
               return nw_len, nw_size

You can compute the load using average area of new job per time interval / the width of the bottlenecked resource. If I remember it correctly, since our simulated interval is finite, the scenario is ephemeral. You can vary the two distributions to create different loads. Hope this helps!

AluriJaganMohini · 2021-01-27T02:58:21Z

Thank You for your response. I am able to figure it out. One more thing, I wanted to get clarified is about the res_slot value. So In this repository, you have used res_slot = 10, but in another repository, you have modified it to 20. So, using 20 has given a very good slowdown value less than 2 for heuristics (SJF, Packer) as well. But the reported results in paper have got huge slowdown
when comparing the results using res_slot =20 . So, can we say that we can achieve different values of slowdown with different parameter values used ? and also using res_slot =20, gives us different slowdown values by maintaining the exact behaviour reported in paper ?

Waiting for your response.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slowdown for cluster load 10% to 190% #12

Slowdown for cluster load 10% to 190% #12

AluriJaganMohini commented Dec 29, 2020 •

edited

Loading

hongzimao commented Jan 9, 2021

AluriJaganMohini commented Jan 27, 2021

Slowdown for cluster load 10% to 190% #12

Slowdown for cluster load 10% to 190% #12

Comments

AluriJaganMohini commented Dec 29, 2020 • edited Loading

hongzimao commented Jan 9, 2021

AluriJaganMohini commented Jan 27, 2021

AluriJaganMohini commented Dec 29, 2020 •

edited

Loading