-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsurvey-work.tex
386 lines (303 loc) · 33.6 KB
/
survey-work.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
\section{Workflow and Workload Management}
\label{wms}
%\fixme{(authors: Maxim, Anders, Simon)}
\subsection{The Challenge of the Three Domains}
\label{3domains}
In the past three decades, technological revolution in industry has enabled and was paralleled by the growing complexity in the field of scientific computing, where
more and more sophisicated methods of data processing and analysis were constantly introduced, oftentimes at a dramatically increased scale.
Processing power and storage were becoming increasingly decentralized, leading to the need to manage these distributed resources in an optimal manner.
On the other hand, increasing sophistication of scientific workflows created the need to support these workflows in the new distributed computing medium.
Rapid evolution of the field and time pressures to deliver in this competitive environment led to design and implementation of complete (to varying degrees)
and successful solutions to satisfy the needs of specific research projects. In general, this had two consequences:
\begin{itemize}
\item Integrated and oftentimes -- not always -- project-specific design of workflow and workload management (see~\ref{workflow_workload} for definitions).
\item Tight coupling of workflow and workload management to data handling components.
\end{itemize}
We observe that there are essentially \textbf{three interconnected domains} involved in this subject: Workflow Management, Workload Management,
and Data Management. In many systems (cf. Pegasus~\cite{pegasus}) some of these domains can be ``fused'' (e.g. Workflow+Workload).
In the following, we bring together a few standard definitions and real life examples to help clarify
relationships among these domains and in doing so form the basis for possible HEP-FCE recommendations. Our goal will be twofold:
\begin{itemize}
\item to identify the features and design considerations proven to be successful and which can serve as guidance going forward.
\item to identify common design and implementation elements and to develop understanding of how to enhance \textit{reusability} of existing and future systems of this kind.
\end{itemize}
\subsection{Description}
\subsubsection{Grid and Cloud Computing}
\label{cloud}
According to a common definition, \textbf{Grid Computing} is the collection of computer resources from multiple locations to reach a common goal. Oftentimes additional
characteristics added to this include decentralized administration and management and adherence to open standards. It was formulated as a concept in early 1990s,
and motivated by the fact that computational tasks handled by large research projects reached the limits of scalability of most individual computing sites.
On the other hand, due to variations in demand, some resources were underutilized at times. There is therefore a benefit in implementing a federation of computing
resources, whereby large spikes in demand can be handled by federated sites, while ``backfilling'' the available capacity with lower priority tasks submitted by a
larger community of users. Technologies developed in the framework of Grid Computing (such as a few reliable and popular types of \textit{Grid Middleware})
became a major enabling factor for many scientific collaborations including nuclear and high-energy physics.
\textbf{Cloud Computing} is essentially an evolution of the Grid Computing concept, with implied higher degree of computing resources and data
storage abstraction, connectivity and transparency of access. In addition, Cloud Computing is characterized by widespread adoption of \textit{virtualization} - which is
also used in the traditional Grid environment but on a somewhat smaller scale. At the time of writing, ``Cloud'' prominently figures in the context of commercial services available on
the Internet, whereby computing resources can be ``rented'' for a fee in the form of a Virtual Machine allocated to the user, or a number of nodes in the Cloud can
be dynamically assigned to perform a neccesary computational task - often as a transient, ephemeral resource. Such dynamic, on-demand characteristic of the
Cloud led to it being described as an ``elastic'' resource. This attribute is prominently featured in the name of the ``Amazon Elastic Compute Cloud'' -- ``the EC2''.
This is an example of a \textit{public} Cloud, available to most entities in the open marketplace. Some organizations choose to deploy Cloud technology on the
computing resources directly owned and controlled by them, which is referred to as \textit{private} Cloud.
Regardless of the obvious differentiation of Cloud computing (due to its characteristics as \textit{a utility computing platform} and pervasive reliance on virtualization),
many of its fundamental concepts and challenges are common with its predesessor, Grid Computing. In fact, the boundary is blurred even further by existing efforts to
enhance Grid middleware with tools based on virtualization and Cloud API which essentially extend ``traditional'' Grid resources with on-demand, elastic Cloud
capability~\cite{star_acat11}, leading to what is essentially a hybrid system. The two are often seen as ``complementary technologies that will coexist at
different levels of resource abstraction'' \cite{atlas_cloud_chep13}. Moreover,
\begin{quote}
\textit{In parallel, many existing grid sites have begun internal evaluations of cloud technologies (such as Open Nebula or OpenStack) to reorganize the internal management of their computing resources.}
(See \cite{atlas_cloud_chep12})
\end{quote}
In recent years, there are a few open-source, community developed and supported Cloud Computing platforms that reached maturity, such as OpenStack~\cite{openstack}.
It includes a comprehensive set of components such as ``Compute'', ``Networking'', ``Storage'' and others and is designed to run on standard commodity hardware.
It is deployed at scale by large organizations and serves as foundation for commercial Cloud services such as HP Helion~\cite{helion}. This technology allows pooling
of resources of both public and private Clouds, resulting in the so-called \textit{Hybrid Cloud}, where technology aims to achieve the best characteristics of both private
and public clouds.
There are no conceptual or architectural barriers for running HEP-type workflows in the Cloud, and in fact, efforts are well under way to implement this
approach~\cite{atlas_cloud_chep12}. However, there are caveats such as
\begin{itemize}
\item Careful overall cost analysis needs to be performed before each decision to deploy on the Cloud, as it is not universally cheaper than
resources already deployed at research centers such as National Laboratories. At the same time, the Cloud is an efficient way to handle peak demand
for the computing power due to its elasticity. It should also be noted that some national supercomputing centers have made part of their
capacity available for more general high throughput use and may be an cost effective alternative.
\item Available bandwidth for staging in/staging our data in the Cloud (and again, its cost) need to be quantified and gauged against the project requirements.
\item Cloud storage cost may be an issue for experiments handling massive amounts of data~\cite[p.~11]{atlas_cloud_chep12}
\end{itemize}
\subsubsection{From the Grid to Workload Management}
\label{from_grid_to_workload}
Utilization of the Grid sites via appropriate middelware does establish a degree of resource federation, but it leaves it up to the user to manage data movement and job submission to multiple sites,
track job status, handle failures and error conditions, aggregate bookkeeping information and perform many other tasks. In absence of automation, this does not scale very well and limits the efficacy
of the overall system.
It was therefore inevitable that soon after the advent of reliable Grid middleware multiple physics experiments and other projects started developing and deploying \textbf{Workload Management Systems (WMS)}.
According to one definition,
\textit{``the purpose of the Workload Manager Service (WMS) is to accept requests for job submission and management coming from its clients and take the appropriate actions to satisfy them''}~\cite{egee_user_guide}.
Thus, one of the principal functions of a Workload Management System can be described as ``brokerage'', in the sense that it matches resource requests to the actual distributed resources
on multiple sites. This matching process can include a variety of factors such as access rules for users and groups, priorities set in the system, or even data locality - which is in fact an important and interesting part of this process~\cite{panda_chep10}.
In practice, despite differing approaches and features, most existing WMS appear to share certain primary goals, and provide solutions to achieve these (to a varying degree). Some examples are:
\begin{itemize}
\item Insulation of the user from the complex and potentially heterogeneous environment of the Grid, and shielding the user from common failure modes of the Grid infrastructure.
\item Rationalization, bookkeeping and automation of software provisioning - for example, distribution of binaries and configuration files to multiple sites.
\item Facilitation of basic monitoring functions, e.g. providing efficient ways to determine the status of jobs being submitted and executed.
\item Prioritization and load balancing across the computing sites.
\item Implementation of interfaces to external data management systems, or actual data movement and monitoring functionality built into certain components of the WMS.
\end{itemize}
We shall present examples of existing WMS in one of the following sections.
\subsubsection{Workflow vs Workload}
\label{workflow_workload}
A \textit{scientific workflow} system is a specialized case of a \textbf{workflow management system}, in which computations and/or transformations and exchange of data are performed according to a defined set of rules
in order to achieve an overall goal ~\cite{grid_workflow_taxonomy,grid_workflow_fit,pegasus}. In the context of this document, this process involves execution on distributed resources. Since the process is
typically largely (or completely) automated, it is often described as ``orchestration'' of execution of multiple interdependent tasks. Workflow systems are sometimes described using the concepts of a \textit{control flow},
which refers to the logic of execution, and \textit{data flow}, which concerns itself with the logic and rules of transmitting data. There are various patterns identified in both control and data flow~\cite{workflow_patterns}.
A complete discussion of this subject is beyond the scope of this document.
A simple and rather typical example of a workflow is often found in Monte Carlo simulation studies performed in High Energy Physics and related fields, where there is a chain of processing steps similar to the pattern below:
\\
\\
\textit{Event Generation $\Longrightarrow$ Simulation $\Longrightarrow$ Digitization $\Longrightarrow$ Reconstruction}
\\
\\
Patterns like this one may also include optional additional steps (implied or made explicit in the logic of the workflow) such as merging of units of data (e.g. files) for more efficient storage and transmission.
Even the most trivial cases of processing, with one step only, may involve multiple files in input and/or output streams, which creates the need to manage this as a workflow. Oftentimes, however,
workflows that need to be created by researchers are quite complex. At extreme scale, understanding the behavior of scientific workflows becomes a challenge and an object of studies in its own right~\cite{panorama}.
Many (but not all) workflows important in the context of this document can be modeled as
\textbf{Directed Acyclic Graphs} (DAG)~\cite{pegasus,deft1,grid_workflow_taxonomy}.
Conceptually, this level of abstraction of the workflow does not involve issues of resource provisioning and utilization, monitoring, optimization, recovery from errors, as well as a plethora of other items essential
for efficient execution of workflows in the distributed environment. These tasks are handled in the context of \textit{Workload Management} which we very briefly described in~\ref{from_grid_to_workload}.
In summary, \textbf{we make a distinction between the Workflow Management} domain which concerns itself with controlling the scientific workflow, \textbf{and Workload Management} which
is a domain of resource provisioning, allocation, execution control and monitoring of execution etc. The former is a level of abstraction above Workload Management, whereas the latter is in
turn a layer of abstractions above the distributed execution environment such as the Grid or Cloud.
% As we already noted,
% Historically, in many cases the Workflow Management layer was not designed together with the underlying Workload Management system, and was (\textbf{FIXME})
\subsubsection{HPC vs HTC}
The term \textbf{High-Performance Computing} (HPC) is used in reference to systems of exceptionally high processing capacity (such as individual supercomputers, which are typically highly parallel systems),
in the sense that they handle substantial workload and deliver results over a relatively short period of time. By contrast, HTC involves harnessing to a wider pool
of more conventional resources in order to deliver a considerable amount of computational power, although potentially over longer periods of time.
\begin{quote}
\textit{``HPC brings enormous amounts of computing power to bear over relatively short periods of time. HTC employs large amounts of computing power for very lengthy periods.''}~\cite{htc}.
\end{quote}
In practice, the term \textit{HTC} does cover most cases of Grid Computing where remote resources are managed for the benefit of the end user and are often made available on prioritized and/or
opportunistic basis (e.g. the so-capped ``spare cycles'' or ``backfilling'', utilizing temporary drops in resource utilization on certain sites to deploy additional workload thus increasing the overall system
throughput). A majority of the computational tasks of the LHC experiments were completed using standard off-the-shelf equipment rather than supercomputers. It is important to note, however,
that modern Workload Management Systems can be adapted to deliver payload to HPC systems such as Leadership Class Facilities in the US, and such efforts are currently under way~\cite{panda_chep13}.
\subsubsection{The Role of Data Management}
In most cases of interest to us, data management plays a crucial role in reaching the scientific goals of an experiment. It is covered in detail separately (see Section~\ref{data}).
As noted above, it represents the \textit{data flow} component of the overall workflow management and therefore needs to be addressed here as well.
In a nutshell, we can distinguish between two different approaches to handling data -- on one hand, managed replication and transfers to sites, and on the other hand, network-centric
methods of access to data repositories such as \xrootd~\cite{xrootd,xrootd_web}.
An important design characteristic which varies widely among Workflow Management Systems is the degree of coupling to the Data Management components. That has significant impact
on reusability of these systems as a more tight coupling usually entails necessity of a larger and more complex stack of software than would otherwise be optimal and has other consequences.
\subsection{Examples}
\label{wms_examples}
\subsubsection{The Scope of this Section}
Workflow and Workload Management, especially taken in conjunction with Data Management (areas with which they are typically interconnected) is a vast
subject and covering features of each example of WMS in any detail would go well beyond the scope of this document. In the following, we provide references
to those systems which are more immediately relevant to HEP and related fields than others.
\subsubsection{HTCondor and glideinWMS}
\label{htcondor}
HTCondor~\cite{htcondor} is one of the best known and important set of Grid and High-Throughput Computing (HTC) systems. It provides an array of functionality, such as
a batch system solution for a computing cluster, remote submission to Grid sites (via its \textit{Condor-G} extension) and automated transfer (stage-in and stage-out) of data.
In the past decade, HTCondor was augmented with a Workload Management System layer, known as \textit{glideinWMS}~\cite{glideinwms}. The latter abstracts remote
resources (Worker Nodes) residing on the Grid and effectively creates a local (in terms of the interface) resource pool accessible by the user.
Putting these resources behind the standard HTCondor interface with its set of utilities is highly beneficial to the users already having familiarity with HTCondor since it geatly shortens the learning curve.
On the other hand, deployment of this system is not always trivial and typically requires a central service to be operated with the desired degree of service level (the so-called ``glidein factory'').
HTCondor has other notable features. One of the most basic parts of its functionality is the ability to transfer data consumed and/or produced by the payload job according to
certain rules set by the user. This works well when used in local cluster situations and is somewhat less reliable when utilized at scale in the context of Grid environment.
One of HTCondor components, \textit{DAGMan}, is a meta-scheduler which uses DAGs (see~\ref{workflow_workload}) to manage workflows~\cite{dagman}. In recent years,
HTCondor was augmented with Cloud-based methodologies and protocols (cf.~\ref{cloud}).
\subsubsection{Workload Management for LHC Experiments}
This is the list of systems (with a few references to bibliography) utilized by the major LHC experiments - note that in each, we identify components representing layers or subdomains such as Workload Management etc:
\begin{center}
\begin{tabular}{ c | c | c | c }
\hline
\textbf{Project} & \textbf{Workload Mgt} & \textbf{Workflow Mgt} & \textbf{Data Mgt}\\ \hline \hline
ATLAS & PanDA~\cite{panda_chep10} & ProdSys2 & Rucio~\cite{rucio_chep13}\\ \hline
CMS & GlideinWMS~\cite{glideinwms} & Crab3~\cite{crab3_chep12} & PhEDEx~\cite{phedex_chep09,phedex_chep10}\\ \hline
LHCb & DIRAC~\cite{dirac_acat09} & DIRAC Production Mgt & DIRAC DMS\\ \hline
Alice & gLite WMS~\cite{glite_chep09} & AliEn~\cite{alien_chep07} & AliEn~\cite{alien_chep07}\\
\hline
\end{tabular}
\end{center}
\subsubsection{@HOME}
\label{at_home}
There are outstanding examples of open source middleware system for volunteer and grid computing, such as BOINC~\cite{boinc} (the original platform behind SETI@HOME),
FOLDING@HOME and MILKYWAY@HOME \cite{mwathome}. The central part of their design is the server-client architecture, where the clients can be running on a variety of platforms,
such as PCs and game consoles made available to specific projects by volunteering owners. Deployment on a cluster or a farm is also possible.
While this approach to distributed computing won't work well for most experiments of the LHC scale (where moving significant amounts of data presents a perennial problem)
it is clearly of interest to smaller scale projects with more modes I/O requirements. Distributed platforms in this class have been deployed, validated and used at scale.
\subsubsection{European Middleware Initiative}
The European Middleware Initiative~\cite{emi} is a concortium of Grid services providers (such as ARC, dCache, gLite, and UNICORE).
It plays an important role in the the Worldwide LHC Computing Grid (WLCG). The \textit{gLite}~\cite{glite_chep09} middleware toolkit was used by LHC experiments as one of methods
to achieve resource federation on the European Grids.
\subsubsection{Fermi Space Telescope Workflow Engine}
The Fermi workflow engine was originally developed to process data from the Fermi Space Telescope on the SLAC batch farm. The goal was to simplify and
automate the bookkeeping for tens of thousands of daily batch jobs with complicated dependencies all running on a general use batch farm while
minimizing the (distributed) manpower needed to operate it. Since it's a general workflow engine it is easily extended to all types of processing
including
Monte Carlo simulation and routine science jobs. It has been extended with more batch interfaces and is routinely used to run jobs at IN2P3 Lyon,
all while being controlled by the central installation at SLAC. It has also been adapted to work in EXO and SuperCDMS.
\subsection{Common Features}
\subsubsection{``Pilots''}
As we mentioned in ~\ref{from_grid_to_workload}, one of the primary functions of a WMS is to insulate the user from the heterogeneous and sometimes complex
Grid environment and certain failure modes inherent in it (e.g. misconfigured sites, transient connectivity problems, ``flaky'' worker nodes etc).
There is a proven solution to these issues, which involved the so called \textbf{late binding} approach to the deployment of computational payload.
According to this concept, it is not the actual ``payload job'' that is initially dispatched to a Worker Node residing in a
remote data center, but an intelligent ``wrapper'', sometimes termed a \textbf{``pilot job''}, which first validates the resource, its configuration and
some details of the environment (for example, outbound network connectivity may be tested). In this context, ``binding'' means matching process
whereby the payload job (such as a production job or a user analysis job) which is submitted to the WMS and is awaiting execution is assigned to a
live pilot which has already perfomed validation and configuration of the execution environment (for this reason, this technique is sometimes referred
to as \textbf{``just-in-time workload management''}). Where and how exactly this matching process happens is a subject of a design decision - in PanDA, it's done by
the central server, whereas in DIRAC this process takes place on the Worker Node by utilising the \textit{JobAgent}~\cite{dirac_chep10}.
With proper design,\textbf{ late binding} brings about the following benefits:
\begin{itemize}
\item The part of the overall resource pool that exhibit problems prior to the actual job dispatch is excluded from the matching process by design.
This eliminates a very large fraction of potential failures that the user would otherwise have to deal with and account for, since the resource pool
exposed to the user (or for an automated client performing job submission) is effectively validated.
\item Some very useful diagnostic and logging capability may reside in the pilot. This is very important for troubleshooting and monitoring, which we shall discuss later.
Problematic resources can be identified and flagged at both site and worker node level.
\item In many cases, the overall latency of the system (in the sense of the time between the job submission by the user, and the start of actual execution) will be reduced --
due to the pilot waiting to accept a payload job -- leading to a more optimal user experience (again, cf. the ``just-in-time'' reference).
\end{itemize}
DIRAC was one of the first systems where this concept was proposed and successfully implemented~\cite{dirac_acat09}. This approach also forms the architectural core
of the PanDA WMS~\cite{panda_chep12}.
In distributed systems where the resources are highly opportunistic and/or ephemeral, such as the volunteer computing we mentioned in~\ref{at_home}, this variant
of the client-server model is the essential centerpiece of the design. In BOINC, the ``core client'' (similar to a ``pilot'') performs functions such as maintaining
communications with the server, downloading the payload applications, logging and others~\cite{boinc_client}.
In HTCondor and GlideinWMS (see~\ref{htcondor}) there is no concept of a sophisticated pilot job or core client, but there is a \textit{glidein} agent, which is deployed on Grid resources
and which is a wrapper for the HTCondor daemon process (\textit{startd}). Once the latter is initiated
on the remote worker node it then joins the HTCondor pool. At this point, matching of jobs submitted by the user to HTCondor \textit{slots} becomes possible \cite{glideinwms}.
While this ``lean client'' provides less benefits than more complex ``pilots'', it also belongs to the class of late-binding workload management systems, although at a simpler level.
\subsubsection{Monitoring}
The ability of the user or operator of a WMS to gain immediate and efficient access to information describing the status of jobs, tasks (i.e. collections of jobs) and operations performed on the data
is an essential feature of a good Workload Management System~\cite{pandamon_chep10}. For operators and administrators, it provides crucial debugging and troubleshooting
capabilities. For the users and production managers, it allows better diagnostics of application-level issues and performance, and helps to better plan the user's workflow
~\cite{pandamon_isgc14}.
Of course, all three domains involved in the present discussion (Workflow, Workload, Data) benefit from monitoring capability.
\subsubsection{The Back-end Database}
The power of a successful WMS comes in part from its ability to effectively manage state transitions of the many objects present
in the system (units of data, jobs, tasks etc). This is made possible by utilizing a database to keep the states of these objects.
Most current implementations rely on a RDBMS for this function (e.g. ATLAS PanDA is using the Oracle RDBMS at the time of writing).
The database serves both as the core intrument in the ``brokerage'' logic (matching of workload to resources) and as the source
of data for many kinds of monitoring.
In seeking shared and reusable solutions, we would like to point out that it is highly desirable to avoid coupling of the WMS application code to a particular
type or flavor of the database system, e.g. Oracle vs MySQL etc. Such dependency may lead to difficulties in deployment due to available
expertise and maintenance policies at the target organization and in some cases even licensing costs (cf. Oracle). Solutions such as an ORM layer
or other methods of database abstraction
should be utilized to make possible utilization of a variety of products as the back-end DB solution for the Workload Management System,
without the need to rewrite any significant amount of its code. This is all the more important in light of the widening application of noSQL
technologies in industry and research, since the possibilities for future migration to a new type of DB must remain open.
\subsection{Intelligent Networks and Network Intelligence}
Once again, the issue of network performance, monitoring and application of such information to improve
throughput and efficiency of workflow belongs to two domains, Workload Management and Data Management. In itself, the network performance monitoring is not a new subject by any means and effective monitoring tools have been deployed at scale~\cite{perfsonar_chep12}. However, until recently, network performance data was not
widely used as a crucial factor in managing the workload distribution in HEP domain in ``near time''. In this approach, network performance information is aggregated from a few sources, analyzed and used in determine a more optimal placement of jobs relative to the data~\cite{panda_chep13}.
A complementary strategy is application of ``Intelligent Networks'', whereby data conduits of specified bandwidth
are created as needed using appropriate SDN software, and utilized for optimized data delivery to the processing location
according to the WMS logic.
\subsection{Opportunities for improvement}
% \fixme{Outline (TBD -mxp-)}
\subsubsection{Focus Areas}
\label{wms_focus}
Technical characteristics of Workload Management Systems currently used in HEP and related fields are regarded
as sufficient (including scalability) to cover a wider range of applications and existing examples support that
(cf. LSST and AMS utilizing PanDA~\cite{pandamon_isgc14}). Therefore, the focus needs to be not on entirely new solutions,
but characterization of the existing systems in terms of reusability, ease of deployment and maintenance, and efficient interfaces.
We further itemize this as follows:
\begin{itemize}
\item \textbf{Modularity} - addressing the issue of the ``Three Domains''~(\ref{3domains}):
\begin{itemize}
\item \textbf{WMS \& Data}: interface of a Workload Management System to the Data Management System needs to be designed in a way that
excludes tight coupling of one to another. This will allow deployment of an optimally scaled and efficient WMS in environments where a pre-existing
data management system is in place, or where installation of a particular data management system puts too much strain on local resources. For example,
replicating an instance of a high-performance and scalable system like Rucio which is currently used in ATLAS would be prohibitively expensive for
a smaller research organization.
\item \textbf{Workflow Management}: the concept of scientific workflow management is an old one, but recently it's coming to the fore due to
increased complexity of data transformations in many fields of research, both in HEP and in other disciplines. We recommend investigation of both existing
and new solutions in this area, and design of proper interfaces between Workflow Management systems and underlying Workload Management systems.
\end{itemize}
\item \textbf{Pilots}: the technique of deploying Pilot Jobs to worker nodes on the Grid adds robustness, flexibility and adaptability to the system.
It proved very successful at extreme scale and the use of this technique should be encouraged. Creating application-agnostic templates of pilot code
which can be reused by different experiment at scales smaller than LHC could be a cost-effective way to leverage this technique.
\item \textbf{Monitoring}:
\begin{itemize}
\item \textbf{Value}: A comprehensive and easy-to-use monitoring system has a large impact on the overall productivity of personnel operating a Workload Management System.
This area deserves proper investment of effort.
\item \textbf{Flexibility}: The requirements of experiments and other projects will vary, hence the need for flexible and configurable solutions
for the Monitoring System utilized in Workload Management and Data Management.
\end{itemize}
\item \textbf{Back-end Database}: Ideally, there should be no coupling between the WMS application code and features of the specific database system
utilized as its back-end -- this will hamper reusability. Such dependecies could be factored out and abstracted using techniques such as ORM etc.
\item \textbf{Networks}: There are significant efficiencies to be obtained by utilizing network performance data in the workload management process. Likewise,
intelligent and configurable networks can provide optimal bandwidth for workflow execution.
\item \textbf{Cloud}: Workload Management Systems deployed in this decade and beyond must be capable of efficiently operating in both Grid and Cloud environment,
and in hybrid environments as well.
\end{itemize}
\subsubsection{Summary of Recommendations}
\subsubsection*{WMS Inventory}
We recommend that future activities of an organization such as FCE would include an assessment of major existing Workload Management Systems
using criteria outlined in~\ref{wms_focus}, such as
\begin{itemize}
\item modularity, which would ideally allow to avoid deployment of a monolithic solution and would instead allow to
utilize Proper platform and technologies as needed, in the Data Management, Workload Management and Workflow Management domains.
\item Flexibility and functionality of monitoring.
\item Reduced or eliminated dependency on the type of the database system utilized as back-end.
\item Transparency and ease of utilization of the Cloud resources.
\end{itemize}
This assessment will be useful in a few complementary ways:
\begin{itemize}
\item It will serve as a ``roadmap'' which will help organizations looking to adopt a system for processing their workflows in the Grid and Cloud environment,
to make technology choices and avoid duplication of effort.
\item It will help identify areas where additional effort is needed to improve the existing systems in terms of reusability and ease of maintenance (e.g.
implementing more modular design).
\item It will summarize best practices and experience to drive the development of the next generation of distributed computing systems.
\item It may help facilitate the developement of interopability layer in the existing WMS, which would allow future deployment to mix and match components
from existing solutions in an optimal manner.
\end{itemize}
Further, this assessment should also contain a survey of existing ``external'' (i.e. open source and community-based) components that can be utilized
in existing and future systems, with proper interface design. The goal of this part of the exercise it to identify cases where software reuse may help
to reduce development and maintenance costs. For example, there are existing systems for flexible workflow management which have not been
extensively evaluated for use in HEP and related fields.
It must be reconginzed that due to complexity of the systems being considered, the development of this assessment document will not be a trivial
task and will require appropriate allocation of effort. However, due to sheer scale of deployment of modern WMS, and considerable cost of resources
required for their operation, in terms of both hardware and human capital, such undertaking will be well justified.
\subsubsection*{Cloud Computing}
HEP experiments are entering the era of Cloud Computing. We recommend continuation of effort aimed at investigating and putting in practice
methods and tools to run scientific workflows on the grid. Careful cost/benefit analysis must be performed for relevant use cases.
In addition to extending existing WMS to the Cloud, we must work in the opposite direction, i.e. to maintain efforts to evaluate components of frameworks such as OpenStack~\cite{openstack} for possible ``internal'' use in HEP systems.