-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsurvey-general.tex
252 lines (202 loc) · 12.1 KB
/
survey-general.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
\section{Survey of Current Landscape}
% \fixme{(authors: Anders, Brett, all)}
This section presents a general overview of the current landscape of
HEP libraries and tools.
%
First we list general patterns that run counter to cross-experiment sharing.
%
% what about general beneficial patterns?
%
Secondly, we give a prioritized list of activities and where involvement
by the HEP-FCE could be beneficial.
\subsection{Forces Counter to Cross-experiment Software}
\label{subsec:pitfalls}
Sharing software libraries and tools between experiment more
frequently than is currently done is expected, by the group, to
increase overall productivity. Independent of cross-experiment
sharing, designing and implementing software in a more general manner
is expected to be beneficial. The working group
identified some reasons why such general use software is not as
predominant as it could be.
\subsubsection{Up-front Effort}
Designing and implementing software to solve a general problem instead
of the specific instance faced by one experiment can take more effort
initially. Solving ``just'' the problem one immediately faces is
cheaper in the immediate time scale. If the problem is short lived
and the software abandoned this strategy can be a net-benefit. What
is more often the case, fixes to new problems compound the problem and
the software becomes either brittle and narrowly focused, increasingly
difficult to maintain, and ever less able to be extended.
\subsubsection{Lack of Expertise}
Physicist have always been multidisciplinary, covering all aspects of
an experiment from hardware design, bolt turning, operations, project
management, data analysis and software development. As data rates
increased, algorithms become more complex and networking, storage and
computation technology more advanced, the requirements for a Physicist to
be a software developer have become more challenging to meet while
maintaining needed capabilities in the other
disciplines. As a consequence, some experiments, especially the smaller
ones, lack the software expertise and knowledge of what is available
needed to develop general software solutions or adopt existing ones.
This leads to the same result of solving ``just'' the immediate
problem and associated consequences described above.
\subsubsection{Ignoring Software Design Patterns}
A specific example of lack of expertise manifests in developers who
ignore basic, tried and true software design patterns. This can be
seen in software that lacks any notion of interfaces or layering
between different functionality. Often new features are developed by
finding a spot that ``looks good'' and pasting in some more code to
achieve an immediate goal with no understanding of the long-term consequences.
Like the ``up-front'' costs problem, this strategy is often rewarded as the
individual produces desired results quickly and the problem that this change
causes is not made apparent until later.
\subsubsection{Limited Support}
Some experiments have a high degree of software expertise. These
efforts may even naturally produce software that can have some
cross-experiment benefit. However, they lack the necessary ability to
support their developers to make the final push needed to offer that
software more broadly. In many cases they also do not have the ability
to assure continued support of the software for its use by others.
In the best cases, some are able to provide support on a limited or best effort basis.
While this helps others adopt the software it still leaves room
for improvements. A modest amount of expert time can save a large amount of time of many novices.
\subsubsection{Transitory Members}
Many software developers in an experiment are transitory. After
graduate students and post-docs make a contribution to the software
development and the experiment in general they typically move on to
other experiments in the advancement of their careers. In part, this migration
can help disseminate software between experiments but it also
poses the problem of retaining a nucleus of long-term knowledge and
support around the software they developed.
\subsubsection{Parochial View}
In some cases, beneficial software sharing is hampered by experiments,
groups, labs, etc which suffer from the infamous ``not invented here''
syndrome. A parochial view leads to preferring solutions to come from
within the unit rather than venturing out and surveying a broader
landscape where better, more general solutions are likely to be found.
Parochialism compounds itself by making it ever more difficult for
motivated people to improve the entrenched systems by bringing in more
general solutions.
\subsubsection{Discounting the Problem}
There is a tendency with some Physicists to discount software and
computing solutions. The origin of this viewpoint may be due to the
individual having experience from a time where software and computing
solutions were indeed not as important as they are now. It may also
come as a consequence of that person enjoying the fruits of high
quality software and computing environments and being ignorant of the
effort needed to provide and maintain them. Whatever the origin,
underestimating the importance of developing quality software tools leads
to inefficiency and lack of progress.
\subsubsection{Turf Wars}
Software development is a personal and social endeavor. It is natural
for someone who takes pride in that work to become personally attached
to the software they develop. In some cases this can cloud judgment
and lead to retaining software in its current state while it may be
more beneficial to refactor or discard and reimplement. What are
really prototypes can become too loved to be replaced.
\subsubsection{Perceived Audience and Development Context}
The group made the observation that cues from the audience for the
software and the context in which it is developed lead to shifts in
thinking about a software design. For example, resulting designs tend to be more narrowly
applicable when one knows that the code will be committed to a private
repository only accessible by a single collaboration. On the other hand,
when one is pushing commits to a repository that is fully accessible
by a wide public audience one naturally thinks about broader use cases and
solutions to more general problems.
\subsubsection{Disparate Communications}
Different experiments and experiment-independent software projects
have differing means of communicating. Technical support, knowledge
bases, software repositories, bug trackers, release announcements are
all areas that have no standard implementation. Some groups even
have multiple types of any of these means of communication.
Independent of this, different policies mean that not all information
may be publicly available. These all pose hurdles for the sharing of
software between groups.
\subsubsection{Design vs. Promotion}
For general purpose software to be beneficial across multiple
experiments it needs at least two things. It needs to be well
designed and implemented in a way that is general purpose. It
also needs to be promoted in a way so that potential adopters learn of
its suitability. Often the set of individuals that excel at the
former and excel at the latter have little overlap.
\subsubsection{Decision Making}
An experiment's software is no better than the best software expert
involved in the decision making process used to provide it. And it's often worse.
Decision making is a human action and as such it can suffer from being
driven by the loudest argument and not necessarily the one most sound.
Many times, choices are made in a vacuum lacking suitable opposition.
At times they are made without a decision-making policy and procedures in place or
ignored if one exists, or if followed, without sufficient information
to make an informed decision. Politics and familiarity can trump
rationality and quality.
\subsubsection{Getting off on the wrong foot}
There is often no initial review of what software is available when a new experiment begins.
Frequently a Physicist charged with software duties on an experiment will
jump in and begin to do things the way that they were done in their
last project, thus propagating and baking in inefficencies for another generation.
No time will be spent to see what has changed since an earlier experiment's
software design, and whole evolutions in ways of thinking or recently available
tools updates may be missed.
\subsection{Best Practices for Experiments}
\label{subsec:bestpractices}
\subsubsection{Look around}
New experiments should survey and understand the current state of the
art for software libraries and tools (and Systems and Applications as
covered by the other two working groups). Periodically, established
experiments should do likewise to understand what improvements they
may adopt from or contribute to the community. Experts from other
experiments should be brought in for in-depth consultation even in
(especially in) cases where the collaboration feels there is
sufficient in-house expertise.
\subsubsection{Early Development}
There are certain decisions that if made early and implemented can
save a lot of effort in the future. Experiments should take these
seriously and include them in the conceptual and technical design
reports that are typically required by funding agencies. These
include the following:
\begin{description}
\item[data model] Detailed design for data model schema covering the
stages of data production and processing including: the output of
detector simulation (including ``truth'' quantities), the output of
``raw'' data from detector DAQ and the data produced by and used as
intermediaries in reconstruction codes.
\item[testing] Unit and integration testing methods, patterns and
granularity. These should not depend on or otherwise tied to other
large scale design decisions such as potential event processing
software frameworks.
\item[namespaces] Design broad-enough namespace rules (unique filenames,
event numbering conventions, including re-processed event version tags) to
encompass the entire development, operations and legacy aspects of the
experiment, which may span decades in time and have worldwide distributed
data stores. Filenames, or, in larger experiments, the meta-system
which supports file access and movement, should have unique identifiers
not just for given events or runs at a single location, but even if
a file is moved and mixed with similar files remotely located
(i.e. filename provenance should not rely upon directory path for
uniqueness). One should be able to distinguish development versions
of files from production versions.
If the same dataset is processed multiple times, the filenames or
other metadata or provenance indicators should be available which
uniquely track the processing version. The same goes for code:
software versions must be tracked clearly and comprehensively across the
distributed experiment environment (i.e. across multiple
institutions, experiment phases and local instances of repositories).
\item[scale] Understand the scale of complexity of the software, its
development/developers. Determine if an event processing framework
is needed or if a toolkit library approach is sufficient or maybe if
ad-hoc development strategies are enough.
\item[metadata] Determine what file metadata will be needed across the
entire efforts of the collaboration. Include raw data and the
requirements for its production as well as simulation and processed
data. Consider what file metadata will be needed to support large
scale production simulation and processing. Consider what may be
needed to support ad-hoc file productions by individuals or small
groups in collaboration.
\end{description}
\subsection{Areas of Opportunity}
Each of the following sections focus on one particular \textit{area of opportunity} to make improvements in how the community shares libs/tools between experiments. In each area of opportunity we present:
\begin{itemize}
\item A description of the area.
\item A number of case studies of existing or past software libraries and tools including concrete examples of what works and what does not.
\item Specific aspects that need improvement and an estimation of what efforts would be needed to obtain that.
\end{itemize}