some more clarifications in the --check docs

author: Jack O'Connor <[email protected]> 2020-05-16 13:29:10 -0400
committer: Jack O'Connor <[email protected]> 2020-05-16 13:29:10 -0400
commit: cd436251b61eded574f1a19c24674ea71eacd955 (patch)
tree: 7fab76bd7cf2c2d9cbee50f2b86de23cea1cbae6
parent: e1f3043e76597ea160346d242628c89597ae2198 (diff)
1 files changed, 18 insertions, 16 deletions
diff --git a/b3sum/what_does_check_do.md b/b3sum/what_does_check_do.md
index 1f7bc80..57c8eaf 100644
--- a/b3sum/what_does_check_do.md
+++ b/b3sum/what_does_check_do.md
@@ -55,19 +55,19 @@ and very similar output for failure.
 
 Since the checkfile format (the regular output format of `b3sum`) is
 newline-separated text, we need to worry about what happens when a filepath
-contains a newline, or worse. Suppose we create a file named `abc[newline]def`
-(7 characters). One way to create such a file is with a Python one-liner like
+contains a newline, or worse. Suppose we create a file named `x[newline]x`
+(3 characters). One way to create such a file is with a Python one-liner like
 this:
 
 ```python
->>> open("abc\ndef", "w")
+>>> open("x\nx", "w")
 ```
 
 Here's what happens when we hash that file with `b3sum`:
 
 ```bash
-$ b3sum abc*
-\af1349b9f5f9a1a6a0404dea36dcc9499bcb25c9adc112b7cc9a93cae41f3262  abc\ndef
+$ b3sum x*
+\af1349b9f5f9a1a6a0404dea36dcc9499bcb25c9adc112b7cc9a93cae41f3262  x\nx
 ```
 
 Notice two things. First, `b3sum` puts a single `\` character at the front of
@@ -117,7 +117,7 @@ However, tragically, we *can* create a file with that byte in its name (on
 Linux at least, though not usually on macOS):
 
 ```python
->>> open(b"def\xFFghi", "w")
+>>> open(b"y\xFFy", "w")
 ```
 
 So some filepaths aren't representable in Unicode at all. Our plan to "convert
@@ -125,8 +125,8 @@ platform-specific bytes into some consistent Unicode encoding" isn't going to
 work for everything. What does `b3sum` do with the file above?
 
 ```bash
-$ b3sum def*
-af1349b9f5f9a1a6a0404dea36dcc9499bcb25c9adc112b7cc9a93cae41f3262  def�ghi
+$ b3sum y*
+af1349b9f5f9a1a6a0404dea36dcc9499bcb25c9adc112b7cc9a93cae41f3262  y�y
 ```
 
 That � in there is a "Unicode replacement character". When we run into
@@ -137,19 +137,21 @@ see a replacement character. Together with a few more details covered in the
 next section, this gives us an important set of properties:
 
 1. Any file can be hashed locally.
-2. Any file with a valid Unicode name can be checked.
-3. Checkfiles are always valid UTF-8.
-4. Checkfiles are portable between Unix and Windows.
+2. Any file with a valid Unicode name not containing the � character can be
+   checked.
+3. Checking ambiguous or unrepresentable filepaths always fails.
+4. Checkfiles are always valid UTF-8.
+5. Checkfiles are portable between Unix and Windows.
 
 ## Formal Rules
 
 1. When hashing, filepaths are represented in a platform-specific encoding,
-   which can accommodate any filepath on the current platform. (In Rust, this
-   is `OsStr`/`OsString`.)
+   which can accommodate any filepath on the current platform. In Rust, this is
+   `OsStr`/`OsString`.
 2. In output, filepaths are first converted to UTF-8. Any non-Unicode segments
-   are replaced with Unicode replacement characters. (In Rust, this is
-   `OsStr::to_string_lossy`.)
-3. Then, if a filepath contains a backslash (U+005C) or a newline (U+000A),
+   are replaced with Unicode replacement characters (U+FFFD). In Rust, this is
+   `OsStr::to_string_lossy`.
+3. Then, if a filepath contains any backslashes (U+005C) or newlines (U+000A),
    these characters are escaped as `\\` and `\n` respectively.
 4. Finally, any output line containing an escape sequence is prefixed with a
    single backslash.
author	Jack O'Connor <[email protected]>	2020-05-16 13:29:10 -0400
committer	Jack O'Connor <[email protected]>	2020-05-16 13:29:10 -0400
commit	cd436251b61eded574f1a19c24674ea71eacd955 (patch)
tree	7fab76bd7cf2c2d9cbee50f2b86de23cea1cbae6
parent	e1f3043e76597ea160346d242628c89597ae2198 (diff)