A Story about the Unix Command `yes’

Time:2019-8-13

Original reading: A Little Story About the `yes` Unix Command

Write on the front: The first translation with a trembling voice

This is the first time to translate a foreign language. Understanding and understanding are different. What you see is v3.0 version.

Thank you @EvianZenyattaPopularization of science and full of comments, as well as the final corrections of Eyun and @Tranch Legendary Teacher, and @Honwhy article sharing~

If you find any mistranslation in this article, you are welcome to comment and thank you (/////).


Beginning of translation

What is the simplest Unix command you know?

YesechoCommand, which prints strings to a standard output stream and ends with O.

Among the piles of simple Unix commands, there are alsoyesOrder. If you run without parametersyesCommand, you get an endless stream of Y characters separated by newline characters:

y
y
y
y
(... You see)

What seemed meaningless at first turned out to be very useful:

Yes | sh Bad Installation.

Have you ever installed a program that requires you to type “y” and press Enter to continue the installation?yesCommand is your savior. It will do a good job of fulfilling the installer’s obligation to continue executing, and you can continue to watch Pootie Tang. (a song and dance comedy).

Write yes

Emmm, which is a basic version of `yes’written by BASIC:

10 PRINT "y"
20 GOTO 10

Here’s how to write’yes’using Python.

while True:
    print("y")

Seems simple? No, not so fast!
It turns out that this program is very slow to execute.

python yes.py | pv -r > /dev/null
[4.17MiB/s]

Compared with the version execution speed of my Mac:

yes | pv -r > /dev/null
[34.2MiB/s]

So I rewrote a faster version of Rust, which is my first attempt:

use std::env;

fn main() {
  let expletive = env::args().nth(1).unwrap_or("y".into());
  loop {
    println!("{}", expletive);
  }
}

Explain:

  • The one you want to print in the loop is calledexpletiveString is the parameter of the first command line.expletiveThis word is mine.yesLearned in books;
  • useunwrap_ortoexpletivePassing parameters, in order to prevent the parameters from being initialized, we willyesAs the default value
  • useinto()Method Converts the default parameter from a single string to a string on the heap

Let’s test the effect:

cargo run --release | pv -r > /dev/null
   Compiling yes v0.1.0
    Finished release [optimized] target(s) in 1.0 secs
     Running `target/release/yes`
[2.35MiB/s] 

Emmm, which doesn’t seem to have much improvement in speed, is even slower than the Python version. I was surprised by the result, so I decided to analyze the source code written to’yes’program in C.

This is the first version of the C language. This is Ken Thompson’s C implementation of Unix Version 7 on January 10, 1979.

main(argc, argv)
char **argv;
{
  for (;;)
    printf("%s\n", argc>1? argv[1]: "y");
}

There is no magic here.

Comparing it with the 128-Line code version of GNU Coreutils mirrored on GitHub, it is still evolving and updating even after 25 years. The last code change was a year ago, and now it’s much faster to execute:

# brew install coreutils
gyes | pv -r > /dev/null 
[854MiB/s]

Finally, the big thing came:

/* Repeatedly output the buffer until there is a write error; then fail.  */
while (full_write (STDOUT_FILENO, buf, bufused) == bufused)
  continue;

Wow, to make writing faster, they just use a buffer. constantBUFSIZUsed to indicate the size of this buffer, different operating systems will choose different buffer sizes [write/read] to operate efficiently (extended reading portal). My system’s buffer size is 1024 bytes. In fact, I can use 8192 bytes more efficiently.

Okay, let’s take a look at my improved version of Rust:

use std::io::{self, Write};

const BUFSIZE: usize = 8192;

fn main() {
  let expletive = env::args().nth(1).unwrap_or("y".into());
  let mut writer = BufWriter::with_capacity(BUFSIZE, io::stdout());
  loop {
    writeln!(writer, "{}", expletive).unwrap();
  }
}

The key point is that the size of the buffer should be multiplied by four to ensure memory alignment.

Now it runs at 51.3MiB/s, much faster than the default version of my system, but still slower than Ken Thompson’s 10.2GiB/s in [efficient input and output] (https://www.gnu.org/software/…).

To update

Again, the Rust community did not disappoint me.

As soon as this article was published on Reddit’s Reddit’s Customer Nwydo mentioned earlier discussions about speed. This is the optimization code of the previous discussant, which breaks the speed of 3GB/s of my machine:

use std::env;
use std::io::{self, Write};
use std::process;
use std::borrow::Cow;

use std::ffi::OsString;
pub const BUFFER_CAPACITY: usize = 64 * 1024;

pub fn to_bytes(os_str: OsString) -> Vec<u8> {
  use std::os::unix::ffi::OsStringExt;
  os_str.into_vec()
}

fn fill_up_buffer<'a>(buffer: &'a mut [u8], output: &'a [u8]) -> &'a [u8] {
  if output.len() > buffer.len() / 2 {
    return output;
  }

  let mut buffer_size = output.len();
  buffer[..buffer_size].clone_from_slice(output);

  while buffer_size < buffer.len() / 2 {
    let (left, right) = buffer.split_at_mut(buffer_size);
    right[..buffer_size].clone_from_slice(left);
    buffer_size *= 2;
  }

  &buffer[..buffer_size]
}

fn write(output: &[u8]) {
  let stdout = io::stdout();
  let mut locked = stdout.lock();
  let mut buffer = [0u8; BUFFER_CAPACITY];

  let filled = fill_up_buffer(&mut buffer, output);
  while locked.write_all(filled).is_ok() {}
}

fn main() {
  write(&env::args_os().nth(1).map(to_bytes).map_or(
    Cow::Borrowed(
      &b"y\n"[..],
    ),
    |mut arg| {
      arg.push(b'\n');
      Cow::Owned(arg)
    },
  ));
  process::exit(1);
}

A new way of implementation!

  • We prepared a pre-filled string buffer for reuse in each loop.
  • Standard output streams are locked, so instead of acquiring and releasing data continuously, we use lock to synchronize data writing.
  • We use platform native std:: ffi:: OsString and std:: borrow:: Cow to avoid unnecessary space allocation

The only thing I can do is delete an unnecessary mut.

This is a summary of my experience:

The seemingly simple yes program is not that simple. It uses an output buffer and memory alignment form to improve performance. It’s interesting to reimplement Unix tools, and I appreciate the interesting tips for making computers run fast.


Attached is the original text.

A Little Story About the yes Unix Command

What’s the simplest Unix command you know?
There’s echo, which prints a string to stdout andtrue, which always terminates with an exit code of 0.

Among the rows of simple Unix commands, there’s alsoyes. If you run it without arguments, you get an infinite stream of y’s, separated by a newline:

y
y
y
y
(...you get the idea)

What seems to be pointless in the beginning turns out to be pretty helpful :

yes | sh boring_installation.sh

Ever installed a program, which required you to type “y” and hit enter to keep going?yesto the rescue! It will carefully fulfill this duty, so you can keep watchingPootie Tang.

Writing yes

Here’s a basic version in… uhm… BASIC.

10 PRINT "y"
20 GOTO 10

And here’s the same thing in Python:

while True:
    print("y")

Simple, eh? Not so quick!
Turns out, that program is quite slow.

python yes.py | pv -r > /dev/null
[4.17MiB/s]

Compare that with the built-in version on my Mac:

yes | pv -r > /dev/null
[34.2MiB/s]
So I tried to write a quicker version in Rust. Here’s my first attempt:

use std::env;

fn main() {
  let expletive = env::args().nth(1).unwrap_or("y".into());
  loop {
    println!("{}", expletive);
  }
}

Some explanations:

  • The string we want to print in a loop is the first command line parameter and is named expletive. I learned this word from the yes manpage.
    1. use unwrap_or to get the expletive from the parameters. In case the parameter is not set, we use “y” as a default.
  • The default parameter gets converted from a string slice (&str) into an owned string on the heap (String) using into().

Let’s test it.

cargo run --release | pv -r > /dev/null
   Compiling yes v0.1.0
    Finished release [optimized] target(s) in 1.0 secs
     Running `target/release/yes`
[2.35MiB/s] 

Whoops, that doesn’t look any better. It’s even slower than the Python version! That caught my attention, so I looked around for the source code of a C implementation.

Here’s the very first version of the program, released with Version 7 Unix and famously authored by Ken Thompson on Jan 10, 1979:

main(argc, argv)
char **argv;
{
  for (;;)
    printf("%s\n", argc>1? argv[1]: "y");
}

No magic here.

Compare that to the 128-line-version from the GNU coreutils, which is mirrored on Github. After 25 years, it is still under active development! The last code change happened around a year ago. That’s quite fast:

# brew install coreutils
gyes | pv -r > /dev/null 
[854MiB/s]

The important part is at the end:

/* Repeatedly output the buffer until there is a write error; then fail.  */
while (full_write (STDOUT_FILENO, buf, bufused) == bufused)
  continue;

Aha! So they simply use a buffer to make write operations faster. The buffer size is defined by a constant namedBUFSIZ, which gets chosen on each system so as to make I/O efficient (see here). On my system, that was defined as 1024 bytes. I actually had better performance with 8192 bytes.

I’ve extended my Rust program:

use std::env;
use std::io::{self, BufWriter, Write};

const BUFSIZE: usize = 8192;

fn main() {
    let expletive = env::args().nth(1).unwrap_or("y".into());
    let mut writer = BufWriter::with_capacity(BUFSIZE, io::stdout());
    loop {
        writeln!(writer, "{}", expletive).unwrap();
    }
}

The important part is, that the buffer size is a multiple of four, to ensure memory alignment.

Running that gave me 51.3MiB/s. Faster than the version, which comes with my system, but still way slower than the results from this Reddit post that I found, where the author talks about 10.2GiB/s.

Update

Once again, the Rust community did not disappoint.
As soon as this post hit the Rust subreddit, user nwydo pointed out a previous discussion on the same topic. Here’s their optimized code, that breaks the 3GB/s mark on my machine:

use std::env;
use std::io::{self, Write};
use std::process;
use std::borrow::Cow;

use std::ffi::OsString;
pub const BUFFER_CAPACITY: usize = 64 * 1024;

pub fn to_bytes(os_str: OsString) -> Vec<u8> {
  use std::os::unix::ffi::OsStringExt;
  os_str.into_vec()
}

fn fill_up_buffer<'a>(buffer: &'a mut [u8], output: &'a [u8]) -> &'a [u8] {
  if output.len() > buffer.len() / 2 {
    return output;
  }

  let mut buffer_size = output.len();
  buffer[..buffer_size].clone_from_slice(output);

  while buffer_size < buffer.len() / 2 {
    let (left, right) = buffer.split_at_mut(buffer_size);
    right[..buffer_size].clone_from_slice(left);
    buffer_size *= 2;
  }

  &buffer[..buffer_size]
}

fn write(output: &[u8]) {
  let stdout = io::stdout();
  let mut locked = stdout.lock();
  let mut buffer = [0u8; BUFFER_CAPACITY];

  let filled = fill_up_buffer(&mut buffer, output);
  while locked.write_all(filled).is_ok() {}
}

fn main() {
  write(&env::args_os().nth(1).map(to_bytes).map_or(
    Cow::Borrowed(
      &b"y\n"[..],
    ),
    |mut arg| {
      arg.push(b'\n');
      Cow::Owned(arg)
    },
  ));
  process::exit(1);
}

Now that’s a whole different ballgame!

  • We prepare a filled string buffer, which will be reused for each loop.
  • Stdout is protected by a lock. So, instead of constantly acquiring and releasing it, we keep it all the time.
  • We use a the platform-native std::ffi::OsString and std::borrow::Cow to avoid unnecessary allocations.

The only thing, that I could contribute was removing an unnecessary mut. ?

Lessons learned

The trivial programyesturns out not to be so trivial after all. It uses output buffering and memory alignment to improve performance. Re-implementing Unix tools is fun and makes me appreciate the nifty tricks, which make our computers fast.

Recommended Today

Implementation of PHP Facades

Example <?php class RealRoute{ public function get(){ Echo’Get me’; } } class Facade{ public static $resolvedInstance; public static $app; public static function __callStatic($method,$args){ $instance = static::getFacadeRoot(); if(!$instance){ throw new RuntimeException(‘A facade root has not been set.’); } return $instance->$method(…$args); } // Get the Facade root object public static function getFacadeRoot() { return static::resolveFacadeInstance(static::getFacadeAccessor()); } protected […]