Because of reading libdnn source code, I started to survey about the existing technologies around DNN. Aside from caffe and Theano, one emerging framework as a keyword keep showing up in my reading is Torch 7. Unusual part of Torch 7 is that it is implemented with luajit, which in the era of R and python is a quite noticeable difference. I have read from 云風’s blog where luajit is pretty good, just I don’t have time to get my feet wet. And now it’s time.
I use an arbitrary benchmark game to test its performance. It’s not based on a solid analysis nor scientific proof, but I think it’s enough to get some clues of how performant it is. The test is to generate 1G of random bytes with dd. The program should read these in 512-byte chunk as key, and trying to detect if there exists duplicate in this pool of byte stream.
#!/bin/bash dd if=/dev/urandom of=large_random.txt bs=1024 count=0 seek=$[1000*1000]
Let’s take a look of results. The same algorithm was implemented in lua, python, c++, haskell and coffee-script. luajit is lightning fast, at least in this case. It only took around one second on my Macbook Air 2013 model. What follows is haskell, where I am pretty impressed that haskell’s hash map could be fine tuned to so much fast, and the code is borrowed from some stack overflow thread long time ago. And then it is C++, it took around four seances. I think it is because I did extra work in copying buffer’s content to string, and that took extra amount of time. Or either it is the unordered_map is not that well optimised. The next is python, where most of the dictionary operations are implemented in C, so I wouldn’t be amazed by this speed.
The nodejs’s speed is quite disappointing, I am not sure it is the third party library for streaming dragging it slow or it is nodejs’s inherent problem. It took around fifteen seconds to finish, and io.js runs a little bit faster than node. I probably did something wrong from this result, any correction are welcomed.
➜ uniq-blocks git:(master) ✗ time luajit lua/uniq-blocks.lua large_random.txt
luajit lua/uniq-blocks.lua large_random.txt 0.87s user 0.30s system 98% cpu 1.183 total
➜ uniq-blocks git:(master) ✗ time haskell/dist/build/uniq-blocks/uniq-blocks large_random.txt
haskell/dist/build/uniq-blocks/uniq-blocks large_random.txt 1.99s user 0.24s system 98% cpu 2.263 total
➜ uniq-blocks git:(master) ✗ time cpp/a.out large_random.txt
cpp/a.out large_random.txt 3.75s user 0.33s system 99% cpu 4.098 total
➜ uniq-blocks git:(master) ✗ time python python/uniq-blocks.py large_random.txt
python python/uniq-blocks.py large_random.txt 4.54s user 1.41s system 99% cpu 5.974 total
➜ uniq-blocks git:(master) ✗ time node js/uniq-blocks.js large_random.txt
reading large_random.txt …
node js/uniq-blocks.js large_random.txt 15.17s user 0.71s system 99% cpu 15.921 total
➜ uniq-blocks git:(master) ✗ time /opt/boxen/homebrew/Cellar/iojs/1.2.0/bin/iojs js/uniq-blocks.js large_random.txt
reading large_random.txt …
/opt/boxen/homebrew/Cellar/iojs/1.2.0/bin/iojs js/uniq-blocks.js 13.24s user 0.62s system 99% cpu 13.931 total
With these figures, I am quite impressed by the luajit. I would definitely invest more of my time on it.
The various codes are as follows:
import os
import os.path
import sys
def dedupeFile(dd, fp):
fd = os.open(fp, os.O_RDONLY)
for block in iter(lambda : os.read(fd, 512), ‘’):
dd.setdefault(block, 0)
dd[block] = dd[block] + 1
os.close(fd)
return dd
dd = {}
dedupeFile(dd, sys.argv[1])
function usage(program_name)
print(“Usage: “ .. program_name .. “ filename”)
end
function dedupeFile(file)
local f = io.open(file, “rb”)
local t = {}
while true do
local block = f:read(512)
if not block then break end
if t[block] == nil then
t[block] = 1
else
t[block] = t[block] + 1
end
end
f:close()
end
if arg[1] == nil then
usage(arg[0])
do return end
else
filename = arg[1]
dedupeFile(filename)
end
fs = require ‘fs’
path = require ‘path’
lazy = require ‘lazy’
BlockStream = require ‘block-stream’
printLine = (line) -> process.stdout.write line + ‘\n’
usage = -> printLine (“Usage: “ + process.argv[1] + “ <filename>”)
if process.argv.length < 3
usage()
process.exit(0)
else
filename = process.argv[2]
printLine(“reading “ + filename + “ …”)
d = {}
dedupeFile = (chunk) ->
if chunk of d
d[chunk] += 1
else
d[chunk] = 1
block = new BlockStream(512)
rstream = fs.createReadStream(filename).pipe(block)
block.on(‘data’, (chunk) -> dedupeFile(chunk))
block.on(‘end’, -> printLine(‘Done’))
import Control.Monad
import qualified Data.ByteString.Lazy as LB
import Data.Foldable
import Data.HashMap.Strict hiding (foldl’)
import Data.Int
import qualified Data.List as DL
import System.Environment
type DummyDedupe = HashMap LB.ByteString Int64
toBlocks :: Int64 -> LB.ByteString -> [LB.ByteString]
toBlocks n bs | LB.null bs = []
| otherwise = let (block, rest) = LB.splitAt n bs
in block : toBlocks n rest
dedupeBlocks :: [LB.ByteString] -> DummyDedupe -> DummyDedupe
dedupeBlocks = flip $ DL.foldl’ (\acc block -> insertWith (+) block 1 $! acc)
dedupeFile :: FilePath -> DummyDedupe -> IO DummyDedupe
dedupeFile fp dd = LB.readFile fp >>= return . (`dedupeBlocks` dd) . toBlocks 512
main :: IO ()
main = do
dd <- getArgs >>= (`dedupeFile` empty) . head
putStrLn . show . (*512) . size $ dd
putStrLn . show . (*512) . foldl’ (+) 0 $ dd
#include <unordered_map>
#include <string>
#include <fstream>
#include <iostream>
#include <array>
void dedupleFile(std::ifstream &f)
{
std::unordered_map<std::string, int> M;
std::array<char, 512> buffer = { 0 };
char *ptr = &buffer[0];
while (f.read(ptr, 512)) {
std::string key(ptr);
if (M.count(key) > 0) {
M[key] += 1;
} else {
M[key] = 1;
}
}
}
int main(int argc, char *argv[])
{
if (argc < 2) {
std::cerr << “Usage: “ << argv[0] << “ <filename>” << std::endl;
return 1;
}
std::ifstream is(argv[1], std::ifstream::binary);
if (is) {
dedupleFile(is);
} else {
std::cerr << “reading failed” << std::endl;
}
return 0;
}