-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
log_reader: tighten parsing #438
base: main
Are you sure you want to change the base?
Conversation
Created using spr 1.3.5
Are the |
Yes. The commit message should suggest that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, some minor nits.
|
||
def __init__( | ||
self, | ||
*, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to force keyword arguments here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for clarity.
for i in range(1, len(LogTestExampleBuilder.ErrorMarkers)): | ||
create_example( | ||
logfile, introduce_errors_pos=LogTestExampleBuilder.ErrorMarkers(i)) | ||
with self.assertRaises(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be more specific about the exception here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some are json, some are the log protocol. Listing them all would be tedious. We could wrap them, but how's that different from just saying "Exception"; finally they are individually not that interesting programmatically, insofar as I can tell, because regardless of the underlying error, programmatically there's little we can do. So the exceptions here matter as something to ultimately report to the user, but that's about it.
writer.write_observation_marker(0) | ||
writer.write_buff([1], ctypes.c_int16) | ||
|
||
with self.assertRaises(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a member of this team so please treat my suggestions as optional.
compiler_opt/rl/log_reader_test.py
Outdated
nl = '\n'.encode('utf-8') | ||
error_nl = 'hi there'.encode('utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be more self-documenting to use newline
and error_newline
instead?
Also,
nl = '\n'.encode('utf-8') | |
error_nl = 'hi there'.encode('utf-8') | |
nl = b'\n' | |
error_nl = b'hi there' |
unnecessary-encode-utf8 (UP012)
Derived from the pyupgrade linter.
Fix is always available.
What it does
Checks for unnecessary calls to encode
as UTF-8.
Why is this bad?
UTF-8 is the default encoding in Python, so there is no need to call
encode
when UTF-8 is the desired encoding. Instead, use a bytes literal.
Example
"foo".encode("utf-8")
Use instead:
b"foo"
References
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, also renamed write_nl
to write_newline
.
Btw, any idea why ruff didn't report UP012?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Ruff does not report because #436 is still in review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh. well that explains that :)
@boomanaiden154 can #436 go in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, should be good to go now.
compiler_opt/rl/log_reader_test.py
Outdated
self.write_nl(LogTestExampleBuilder.ErrorMarkers.CTX_MARKER_POS) | ||
|
||
def write_observation_marker(self, obs_idx: int): | ||
self._opened_file.write(json_to_bytes({'observation': obs_idx})) | ||
self.write_nl(LogTestExampleBuilder.ErrorMarkers.OBS_MARKER_POS) | ||
|
||
def write_outcome_marker(self, obs_idx: int): | ||
self._opened_file.write(json_to_bytes({'outcome': obs_idx})) | ||
self.write_nl(LogTestExampleBuilder.ErrorMarkers.OUTCOME_MARKER_POS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.write_nl(LogTestExampleBuilder.ErrorMarkers.CTX_MARKER_POS) | |
def write_observation_marker(self, obs_idx: int): | |
self._opened_file.write(json_to_bytes({'observation': obs_idx})) | |
self.write_nl(LogTestExampleBuilder.ErrorMarkers.OBS_MARKER_POS) | |
def write_outcome_marker(self, obs_idx: int): | |
self._opened_file.write(json_to_bytes({'outcome': obs_idx})) | |
self.write_nl(LogTestExampleBuilder.ErrorMarkers.OUTCOME_MARKER_POS) | |
self.write_nl(self.ErrorMarkers.CTX_MARKER_POS) | |
def write_observation_marker(self, obs_idx: int): | |
self._opened_file.write(json_to_bytes({'observation': obs_idx})) | |
self.write_nl(self.ErrorMarkers.OBS_MARKER_POS) | |
def write_outcome_marker(self, obs_idx: int): | |
self._opened_file.write(json_to_bytes({'outcome': obs_idx})) | |
self.write_nl(self.ErrorMarkers.OUTCOME_MARKER_POS) |
Also, perhaps replace these with a single write_marker(self, field_value: dict, marker: ErrorMarkers)
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why self
instead of LogTestExampleBuilder
, the class isn't really instance-variant. As in, is it just that it's shorter?
I don't follow the write_marker
suggestion, how would that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highly optional!
write_observation_marker(self, obs_idx=1)
--> write_marker(self, field_value={'observation': 1}, marker=self.ErrorMarkers.OBS_MARKER_POS)
write_outcome_marker(self, obs_idx=2)
--> write_marker(self, field_value={'outcome': 2}, marker=self.ErrorMarkers.OUTCOME_MARKER_POS)
compiler_opt/rl/log_reader_test.py
Outdated
|
||
def write_buff(self, buffer: list, ct): | ||
# we should get the ctypes array to bytes for pytype to be happy. | ||
if self._introduce_error_pos == \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 cautions against the use of backslash line continuation in Python code because any whitespace to the right of the backslash breaks the code on a change that is invisible to the reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
compiler_opt/rl/log_reader_test.py
Outdated
for i in range(1, len(LogTestExampleBuilder.ErrorMarkers)): | ||
create_example( | ||
logfile, introduce_errors_pos=LogTestExampleBuilder.ErrorMarkers(i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for i in range(1, len(LogTestExampleBuilder.ErrorMarkers)): | |
create_example( | |
logfile, introduce_errors_pos=LogTestExampleBuilder.ErrorMarkers(i)) | |
for error_marker in self.ErrorMarkers: | |
if error_marker: # Skip the None marker | |
create_example(logfile, introduce_errors_pos=error_marker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
compiler_opt/rl/log_reader_test.py
Outdated
self._opened_file.write(LogTestExampleBuilder.error_nl if position == self | ||
._introduce_error_pos else LogTestExampleBuilder.nl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._opened_file.write(LogTestExampleBuilder.error_nl if position == self | |
._introduce_error_pos else LogTestExampleBuilder.nl) | |
self._opened_file.write(self.error_nl if position == self._introduce_error_pos else self.nl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, depends on the motivation for using self
rather than the class name (previous question)
class LogTestExampleBuilder: | ||
"""Construct a log.""" | ||
|
||
newline = b'\n' | ||
error_newline = b'hi there' | ||
|
||
class ErrorMarkers(enum.IntEnum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newline, error_newline, and ErrorMarkers are all defined inside LogTestExampleBuilder so any instance of LogTestExampleBuilder can access them via self.
which is shorter to write/read but will automatically update if the outer class is renamed and they will also be accessible from subclasses of the outer class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... (being of a C++ mindset) not sure how to digest the auto-update part of the motivation. Is that (friendliness to opportunistic extensibility) the pythonic way though? (happy to follow that! basically asking to educate myself)
The log format has a few places where we insert
\n
for human readability. They should be checked they are indeed just that character. This will allow us, in a subsequent PR, to avoid infinitely-blocking cases due to the process producing the log exiting unexpectedtly, by inserting errors into the reader to unblock and exit it.Similarly, checking that the tensor data received matches in size what was expected.
Refactored a bit the test utility for constructing examples.