This is a follow-up of luajit’s benchmark. We use 512 bytes as the key to dedup a big random binary file. I wrote the same benchmark in Rust to get an idea how performant it is. Sadly, with the Rust-1.0 alpha2, its std::old_io runtime library is so terrible in performance. With a naive loop of IO read it would take around 20 seconds on my machine. Even worse then Nodejs’s figures. I then decided to skip the runtime library and implemented it in libc FFI, and get the figures much closer to what was expected, but it is still slower then C++’s naive implementation on Mac OS X.
Here is the time result
➜ rust git:(master) rustc --version
rustc 1.0.0-nightly (522d09dfe 2015-02-19) (built 2015-02-20)
➜ rust git:(master) time ./target/uniq-blocks
./target/uniq-blocks 2.95s user 1.58s system 99% cpu 4.560 total
And code.
#![feature(libc)] extern crate libc;
use libc::funcs::posix88::unistd::read;
use libc::funcs::posix88::fcntl::open;
use libc::consts::os::posix88::O_RDONLY;
use libc::consts::os::posix88::S_IREAD;
use libc::types::common::c95::c_void;
use std::ffi::CString;
use std::str;
use std::collections::HashMap;
fn main() {
let path_str = CString::new("large_random.txt").unwrap();
let fd = unsafe {
open(path_str.as_ptr(), O_RDONLY, S_IREAD)
};
let mut buf: Vec<u8> = Vec::with_capacity(512);
let mut M: HashMap<String, u32> = HashMap::new();
loop {
let nread = unsafe { read(fd, buf.as_mut_ptr() as *mut c_void, 512) };
let s = str::from_utf8(buf.as_slice()).ok().expect("from_utf8 error");
let ss = String::from_str(s);
if M.contains_key(&ss) {
match M.get_mut(&ss) {
Some(x) => { *x += 1 },
None => ()
};
} else {
M.insert(ss, 1);
}
if nread == 0 {
break;
};
}
}