-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlistOfExperiences.tex
146 lines (130 loc) · 10.2 KB
/
listOfExperiences.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
\textbf{{Sandia National Laboratories $\>$$\>$$\>$ Principal Member of Technical Staff II $\>$$\>$$\>$Jul 2024 - present}}
\begin{itemize}
\item Owner of Kokkos Software Ecosystem's Kokkos Tools (now part of the Linux Foundation) which provides profiling and debugging capabilities for Kokkos programs (for performance portable parallel programs) as well as sophisticated auto-tuning and performance analysis capabilities.
\item Project Technical Manager for 5 people to research software correctness tools and formal methods for Kokkos.
\item Contributing to OpenMP and OpenACC specification for tooling capabilities to support.
\item AI-assisted tools for code transformations and auto-tuning of Kokkos.
%loop transformation and tasking features that enrich autotuning capabilities of Kokkos Tools.
%\item Maintining Kokkos Tools project, which is part of the Linux Foundation as Apr. 2024.
\item Developing software packages for Tools of HPC Software stacks, in particular using Spack.
\end{itemize}
\textbf{{Sandia National Laboratories $\>$$\>$$\>$ Senior Member of Technical Staff $\>$$\>$$\>$Aug 2022 - Jul 2024}}
\begin{itemize}
\item Developing and testing features in the US DoE's LLVM's OpenMP implementation.
\item Contributing to OpenMP 6.0 Specification, specifically on topics of affinity, loop transformations, accelerators and tasking.
\item Prototyping tunable locality-aware loop scheduling strategy features for OpenMP, and generally user-defined loop schedules, for LLVM's OpenMP implementation.
\item Owner of Kokkos Software Ecosystem's Kokkos Tools, which provides profiling and debugging capabilities for Kokkos programs (for performance portable parallel programs) as well as sophisticated auto-tuning and performance analysis capabilities.
\item Contributor to the DOE ASCR Xstack project on automated test generation for parallel programs via LLVM. Developing a source-to-source translator via the ROSE compiler plugin for the LLVM's clangASTRewriter to translate a Kokkos program to a Kokkos Model (simplified version of Kokkos) program for analysis by LLVM's Klee symbolic execution library.
\end{itemize}
\dates{August 2022 - present}
\location{Upton, New York, USA}
\title{Computational Scientist}
\employer{Brookhaven National Laboratory}
\textbf{{Brookhaven National Laboratory $\>$$\>$$\>$$\>$Computational Scientist$\>$$\>$$\>$$\>$May 2019 - Aug 2022}}
\begin{itemize}
\item Contributed to developing an LLVM OpenMP implementation, specifically the OpenMP implementation's compiler and its runtime, targetted for Department of Energy's upcoming Exascale Supercomputer platforms.
\item Designed and implemented OpenMP task-to-multiGPU scheduling strategies to improve within-node load balancing of applications running on supercomputers having multiple GPUs per node.
\item Developed tunable locality-aware loop scheduling strategies, and generally user-defined loop schedules, in LLVM's OpenMP implementation, in the context of MPI+OpenMP applications runnning on supercomputers having multicore processors and GPUs.
\item Contributed to the OpenMP Language Committee to support OpenMP parallelization on multiple GPUs of a node for C, C++ and Fortran, and for user-defined schedules in OpenMP.
\item Developed benchmarks and evaluating OpenMP implementations, e.g., LLVM's OpenMP, NVIDIA's OpenMP, on Exascale Supercomputers.
\item Led hackathons (including virtual) for using OpenMP on Department of Energy's Exascale Supercomputers.
\item Served as Technical Project Manager for US DoE Exascale Computing Project’s SOLLVE project to develop LLVM's OpenMP.
\item Represented Brookhaven National Laboratory in the OpenMP Architecture Review Board.
\end{itemize}
\dates{June 2018 - April 2019}
\location{Champaign, Illinois, USA}
\title{Software Developer}
\employer{Charmworks, Inc.}
\textbf{Charmworks, Inc. $\>$$\>$$\>$$\>$Software Developer$\>$$\>$$\>$$\>$Jun 2018 - Apr 2019}
%\begin{position}
\vspace{0.0in}
\begin{itemize}
\item Collaborated with Lawrence Livermore National Lab on a proposal for a synergistic loop scheduling and load balancing strategy.
\item Worked on making User-defined Loop Scheduling portable across different parallel programming library, done with Oak Ridge National Lab through DoE Exascale Computing Program.
\item Added examples of loop scheduling in OpenMP in the Examples section of OpenMP Specification.
\item Worked on a NSF startup SBIR proposal for loop scheduling for desktop computers.
\item Collaborated on developing a proposal to add an OpenMP User-defined Schedule to the OpenMP specification based on an OpenMPCon 2017 paper, presenting a proposal at the OpenMP F2F in Santa Clara and the upcoming F2F in Toronto.
\item Worked on papers for User-defined Loop Scheduling for publication.
\item Assisted with slides for pitch and marketing of Charm++ software, and providing feedback for tutorials on Charm++.
\item Integrated a shared memory library for sophisticated loop scheduling strategies, including some based on my dissertation, into the current version of Charm++.
%item Comparing performance of a loop scheduling strategy available in the integrated shared memory library with the performance of the corresponding loop scheduling strategy available in LLVM’s OpenMP library.
\end{itemize}
%\end{position}
\textbf{University of Southern California / ISI $\>$$\>$$\>$$\>$Computer Scientist$\>$$\>$$\>$$\>$Dec 2016 - Jun 2018}
\vspace*{-0.0in}
\begin{itemize}
\item Worked with postdoc from LLNL on a proposal to study
techniques that combine loop scheduling and load balancing to improve
performance of scientific applications.
\item Worked with OpenMP Language Committee to support user-defined loop schedules in OpenMP.
\item Translated an x-ray tomography code written in
Matlab code to C code and then parallelizing it to run on a supercomputer
having nodes with GPGPUs.
\item Made modifications to LLVM compiler to support new OpenMP loop schedules.
\item Ensured external network infrastructure to support transfer of application code's input data files were adequate
for an application code's efficient execution using the Globus Toolkit.
\item Worked in team to manage computational performance aspects of running an application program involving Fast Fourier Transformation and image reconstruction algorithms.
%\item \small Doing optimizations for MPI+CUDA application code involving low-overhead loop scheduling and loop optimizations such as loop unrolling.
%\item \small Working on transformations in LLVM.
\end{itemize}
%TODO: adaptive VS hybrid VS ...
\textbf{Charmworks, Inc.$\>$$\>$$\>$$\>$Developer$\>$$\>$$\>$$\>$Jan 2016 - Nov 2016}
\vspace*{-0.0in}
\begin{itemize}
\item Implemented mixed static/dynamic loop scheduling
strategies within Charm++'s thread scheduling library.
%TODO: consider adding 'including in cloud environments' the end of
%the sentence.
%TODO: make paragraph
\item Helped to improve portability of Charm++ to a variety of platforms.
\item Assisted with business aspects of a high-tech startup.
\end{itemize}
\textbf{ University of Illinois$\>$$\>$$\>$$\>$Postdoctoral Associate$\>$$\>$$\>$$\>$Jul 2015 – Dec 2015}
\vspace*{-0.0in}
\begin{itemize}
\item Developed library that allows application programmers to use strategies from dissertation.
\item Adapted a plasma physics application code to work on a
GPGPU processor and Intel Xeon Phi.
\item Incorporated over-decomposition and locality-aware scheduling into strategies from dissertation.
\end{itemize}
\textbf{Lawrence Livermore Nat’l Lab$\>$$\>$$\>$$\>$Lawrence Scholar$\>$$\>$$\>$$\>$Feb 2012 – Jun 2014}
\vspace*{-0.0in}
\begin{itemize}
\item Measured MPI communication delays for micro-benchmarks codes run on supercomputers and worked to find tools to measure dequeue overheads of OpenMP loop schedulers.
\item Created a software system for automated performance optimization and application programmer usability of low-overhead hybrid scheduling
strategies.
\item Developed a ROSE-based custom compiler for automatically transforming MPI+OpenMP applications to use low-overhead scheduling
techniques and runtime.
\item Assessed further opportunities for performance improvement of low-overhead schedulers, including improvement of spatial locality
of low-overhead schedulers.
\end{itemize}
\textbf{Lawrence Livermore Nat’l Lab$\>$$\>$$\>$$\>$Scholar$\>$$\>$$\>$$\>$Jun 2011 - Sep 2011}
\vspace*{-0.0in}
\begin{itemize}
\item Experimented with different OpenMP parameters of implemented MPI+OpenMP application code to understand performance optimizations on
LLNL supercomputers.
\item Developed software design for low-overhead loop scheduling library based on libgomp software design.
\end{itemize}
\textbf{Lawrence Berkeley Nat’l Lab$\>$$\>$$\>$$\>$Summer Scholar$\>$$\>$$\>$$\>$Aug 2010 - Sep 2010}
\begin{itemize}
\item Analyzed results for the performance tests developed on NERSC machines.
\item Compared with collectives in reference to MPI (mpich2) runtime system.
\end{itemize}
\textbf{Lawrence Livermore Nat’l Lab$\>$$\>$$\>$$\>$Scholar$\>$$\>$$\>$$\>$May 2010 - Aug 2010}
\begin{itemize}
\item Modified libgomp runtime system in order to integrate low-overhead schedulers within it.
\item Developed an algorithm multi-stage low-overhead loop scheduler with each stage associated with a level in the memory hierarchy, allowing for MPI-shared memory extensions to be used in conjunction with the low-overhead loop scheduling strategies.
\end{itemize}
\textbf{Goldman-Sachs$\>\>\>\>$Summer Analyst$\>\>\>\>$Jun 2009 – Sep 2009}
\vspace*{-0.0in}
\begin{itemize}
\item Wrote code for testing trading system infrastructure functions under extreme market conditions.
\item Analyzed performance bottlenecks of system infrastructure functions.
\end{itemize}
\textbf{Proteus Technologies, LLC$\>\>\>\>$Software Developer$\>\>\>\>$Aug 2007 – Apr 2008}
\vspace*{-0.0in}
\begin{itemize}
\item Primarily responsible for developing, testing and documenting a service-oriented software application for health and status monitoring of large-scale parallel and distributed networked systems.
\item Developed company standards for software development (System Requirements Specifications, Design Documentation).
\item Designed and implemented algorithms for cost optimization applications. Used dynamic programming, discrete optimization heuristics, and APIs.
\end{itemize}