log_reader: tighten parsing #438

mtrofin · 2025-02-10T23:30:01Z

The log format has a few places where we insert \n for human readability. They should be checked they are indeed just that character. This will allow us, in a subsequent PR, to avoid infinitely-blocking cases due to the process producing the log exiting unexpectedtly, by inserting errors into the reader to unblock and exit it.

Similarly, checking that the tensor data received matches in size what was expected.

Refactored a bit the test utility for constructing examples.

Created using spr 1.3.5

tvmarino · 2025-02-11T16:21:54Z

Are the env.py timeout changes going to be in a different pr?

mtrofin · 2025-02-11T17:36:43Z

Are the env.py timeout changes going to be in a different pr?

Yes. The commit message should suggest that.

tvmarino

LGTM

boomanaiden154

Overall LGTM, some minor nits.

boomanaiden154 · 2025-02-11T20:18:39Z

compiler_opt/rl/log_reader_test.py

+
+  def __init__(
+      self,
+      *,


Why do we want to force keyword arguments here?

for clarity.

boomanaiden154 · 2025-02-11T20:19:53Z

compiler_opt/rl/log_reader_test.py

+    for i in range(1, len(LogTestExampleBuilder.ErrorMarkers)):
+      create_example(
+          logfile, introduce_errors_pos=LogTestExampleBuilder.ErrorMarkers(i))
+      with self.assertRaises(Exception):


Can we be more specific about the exception here?

Some are json, some are the log protocol. Listing them all would be tedious. We could wrap them, but how's that different from just saying "Exception"; finally they are individually not that interesting programmatically, insofar as I can tell, because regardless of the underlying error, programmatically there's little we can do. So the exceptions here matter as something to ultimately report to the user, but that's about it.

boomanaiden154 · 2025-02-11T20:19:58Z

compiler_opt/rl/log_reader_test.py

+      writer.write_observation_marker(0)
+      writer.write_buff([1], ctypes.c_int16)
+
+    with self.assertRaises(Exception):


cclauss

I am not a member of this team so please treat my suggestions as optional.

cclauss · 2025-02-12T07:08:38Z

compiler_opt/rl/log_reader_test.py

+  nl = '\n'.encode('utf-8')
+  error_nl = 'hi there'.encode('utf-8')


Would it be more self-documenting to use newline and error_newline instead?

Also,

Suggested change

nl = '\n'.encode('utf-8')

error_nl = 'hi there'.encode('utf-8')

nl = b'\n'

error_nl = b'hi there'

ruff-rules-for-pyupgrade #436

% ruff rule UP012

unnecessary-encode-utf8 (UP012)

Derived from the pyupgrade linter.

Fix is always available.

What it does

Checks for unnecessary calls to encode as UTF-8.

Why is this bad?

UTF-8 is the default encoding in Python, so there is no need to call
encode when UTF-8 is the desired encoding. Instead, use a bytes literal.

Example

"foo".encode("utf-8")

Use instead:

b"foo"

References

Python documentation: str.encode

done, also renamed write_nl to write_newline.

Btw, any idea why ruff didn't report UP012?

Nice! Ruff does not report because #436 is still in review.

oh. well that explains that :)

@boomanaiden154 can #436 go in?

Yeah, should be good to go now.

cclauss · 2025-02-12T07:38:21Z

compiler_opt/rl/log_reader_test.py

+    self.write_nl(LogTestExampleBuilder.ErrorMarkers.CTX_MARKER_POS)
+
+  def write_observation_marker(self, obs_idx: int):
+    self._opened_file.write(json_to_bytes({'observation': obs_idx}))
+    self.write_nl(LogTestExampleBuilder.ErrorMarkers.OBS_MARKER_POS)
+
+  def write_outcome_marker(self, obs_idx: int):
+    self._opened_file.write(json_to_bytes({'outcome': obs_idx}))
+    self.write_nl(LogTestExampleBuilder.ErrorMarkers.OUTCOME_MARKER_POS)


Suggested change

self.write_nl(LogTestExampleBuilder.ErrorMarkers.CTX_MARKER_POS)

def write_observation_marker(self, obs_idx: int):

self._opened_file.write(json_to_bytes({'observation': obs_idx}))

self.write_nl(LogTestExampleBuilder.ErrorMarkers.OBS_MARKER_POS)

def write_outcome_marker(self, obs_idx: int):

self._opened_file.write(json_to_bytes({'outcome': obs_idx}))

self.write_nl(LogTestExampleBuilder.ErrorMarkers.OUTCOME_MARKER_POS)

self.write_nl(self.ErrorMarkers.CTX_MARKER_POS)

def write_observation_marker(self, obs_idx: int):

self._opened_file.write(json_to_bytes({'observation': obs_idx}))

self.write_nl(self.ErrorMarkers.OBS_MARKER_POS)

def write_outcome_marker(self, obs_idx: int):

self._opened_file.write(json_to_bytes({'outcome': obs_idx}))

self.write_nl(self.ErrorMarkers.OUTCOME_MARKER_POS)

Also, perhaps replace these with a single write_marker(self, field_value: dict, marker: ErrorMarkers) method.

why self instead of LogTestExampleBuilder, the class isn't really instance-variant. As in, is it just that it's shorter?

I don't follow the write_marker suggestion, how would that work?

Highly optional!

write_observation_marker(self, obs_idx=1) --> write_marker(self, field_value={'observation': 1}, marker=self.ErrorMarkers.OBS_MARKER_POS) write_outcome_marker(self, obs_idx=2) --> write_marker(self, field_value={'outcome': 2}, marker=self.ErrorMarkers.OUTCOME_MARKER_POS)

cclauss · 2025-02-12T07:46:14Z

compiler_opt/rl/log_reader_test.py

+
+  def write_buff(self, buffer: list, ct):
+    # we should get the ctypes array to bytes for pytype to be happy.
+    if self._introduce_error_pos == \


PEP8 cautions against the use of backslash line continuation in Python code because any whitespace to the right of the backslash breaks the code on a change that is invisible to the reader.

cclauss · 2025-02-12T07:49:20Z

compiler_opt/rl/log_reader_test.py

+    for i in range(1, len(LogTestExampleBuilder.ErrorMarkers)):
+      create_example(
+          logfile, introduce_errors_pos=LogTestExampleBuilder.ErrorMarkers(i))


Suggested change

for i in range(1, len(LogTestExampleBuilder.ErrorMarkers)):

create_example(

logfile, introduce_errors_pos=LogTestExampleBuilder.ErrorMarkers(i))

for error_marker in self.ErrorMarkers:

if error_marker: # Skip the None marker

create_example(logfile, introduce_errors_pos=error_marker)

cclauss · 2025-02-12T08:01:18Z

compiler_opt/rl/log_reader_test.py

+    self._opened_file.write(LogTestExampleBuilder.error_nl if position == self
+                            ._introduce_error_pos else LogTestExampleBuilder.nl)


Suggested change

self._opened_file.write(LogTestExampleBuilder.error_nl if position == self

._introduce_error_pos else LogTestExampleBuilder.nl)

self._opened_file.write(self.error_nl if position == self._introduce_error_pos else self.nl)

ack, depends on the motivation for using self rather than the class name (previous question)

Created using spr 1.3.5

cclauss · 2025-02-12T17:09:23Z

compiler_opt/rl/log_reader_test.py

+class LogTestExampleBuilder:
+  """Construct a log."""
+
+  newline = b'\n'
+  error_newline = b'hi there'
+
+  class ErrorMarkers(enum.IntEnum):


newline, error_newline, and ErrorMarkers are all defined inside LogTestExampleBuilder so any instance of LogTestExampleBuilder can access them via self. which is shorter to write/read but will automatically update if the outer class is renamed and they will also be accessible from subclasses of the outer class.

Hmm... (being of a C++ mindset) not sure how to digest the auto-update part of the motivation. Is that (friendliness to opportunistic extensibility) the pythonic way though? (happy to follow that! basically asking to educate myself)

[𝘀𝗽𝗿] initial version

8d6f828

Created using spr 1.3.5

mtrofin requested review from boomanaiden154 and tvmarino February 11, 2025 00:00

tvmarino approved these changes Feb 11, 2025

View reviewed changes

boomanaiden154 approved these changes Feb 11, 2025

View reviewed changes

cclauss requested changes Feb 12, 2025

View reviewed changes

feedback

2911977

Created using spr 1.3.5

cclauss reviewed Feb 12, 2025

View reviewed changes

cclauss self-requested a review February 12, 2025 17:19

cclauss approved these changes Feb 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

log_reader: tighten parsing #438

log_reader: tighten parsing #438

mtrofin commented Feb 10, 2025 •

edited

Loading

tvmarino commented Feb 11, 2025

mtrofin commented Feb 11, 2025

tvmarino left a comment

boomanaiden154 left a comment

boomanaiden154 Feb 11, 2025

mtrofin Feb 12, 2025

boomanaiden154 Feb 11, 2025

mtrofin Feb 12, 2025

boomanaiden154 Feb 11, 2025

mtrofin Feb 12, 2025

cclauss left a comment

cclauss Feb 12, 2025

mtrofin Feb 12, 2025

cclauss Feb 12, 2025

mtrofin Feb 12, 2025

boomanaiden154 Feb 13, 2025

cclauss Feb 12, 2025

mtrofin Feb 12, 2025

cclauss Feb 12, 2025

cclauss Feb 12, 2025

mtrofin Feb 12, 2025

cclauss Feb 12, 2025

mtrofin Feb 12, 2025

cclauss Feb 12, 2025

mtrofin Feb 12, 2025

cclauss Feb 12, 2025

mtrofin Feb 12, 2025

		nl = '\n'.encode('utf-8')
		error_nl = 'hi there'.encode('utf-8')

		self._opened_file.write(LogTestExampleBuilder.error_nl if position == self
		._introduce_error_pos else LogTestExampleBuilder.nl)

	self._opened_file.write(LogTestExampleBuilder.error_nl if position == self
	._introduce_error_pos else LogTestExampleBuilder.nl)
	self._opened_file.write(self.error_nl if position == self._introduce_error_pos else self.nl)

log_reader: tighten parsing #438

Are you sure you want to change the base?

log_reader: tighten parsing #438

Conversation

mtrofin commented Feb 10, 2025 • edited Loading

tvmarino commented Feb 11, 2025

mtrofin commented Feb 11, 2025

tvmarino left a comment

Choose a reason for hiding this comment

boomanaiden154 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cclauss left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

unnecessary-encode-utf8 (UP012)

What it does

Why is this bad?

Example

References

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtrofin commented Feb 10, 2025 •

edited

Loading