If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

75_MakingKyotoCabinetRubyBindingToWorkInParallel

Page history last edited by Makoto Inoue 13 years, 10 months ago

開発メモ技術系の作業メモおよびアイデアの記録

2010/04/06 00:20

mikio

Making Kyoto Cabinet Ruby binding work in parallel

Kyoto CabinetのRubyバインディングの使い方については以前の記事で説明してあるが、今回はそこで触れていた並列化をやってみたという話。

I've explained how to use KC Ruby binding at the previous article. This time, I will focus how I made the binding to work in parallel.

Ruby 1.9のスレッドモデル

Ruby 1.9のマニュアルには以下のように書いてある。

ネイティブスレッドを用いて実装されていますが、現在の実装では Ruby VM は Giant VM lock (GVL) を有しており、同時に実行されるネイティブスレッドは常にひとつです。ただし、IO 関連のブロックする可能性があるシステムコールを行う場合には GVL を解放します。その場合にはスレッドは同時に実行され得ます。また拡張ライブラリから GVL を操作できるので、複数のスレッドを同時に実行するような拡張ライブラリは作成可能です。

Threading model in Ruby 1.9

Ruby 1.9 manual says like this.

"Ruby 1.9 is implemented to use native threads. However, the current implementation has Giant VM lock (GVL) and it limits the number of native threads which can be run at one time to one. However, It can release GVL when Ruby calls blocking IO related system calls. In such situations, threads can be run more than one at a time. You can also operate GVL via extension libraries, so you can create libraries which run in multi threads."

ということは、DBMの操作をGVLを外してから行ってあげれば、DBMの操作同士およびDBMの操作とRubyコードの実行については並列に処理することができそうだ。KCのセールスポイントは並列処理性能にあるわけだから、ここは何としても並列化の道を模索せねば。

This means that I can run Ruby codes and DBM operations in parallel if I execute DBM operation after I release GVL. Since KC's strong point is its parallel operation, that's the way to go.

「拡張ライブラリからGVLを操作する方法」についてはまだ文書が書かれていないようだが、Twitterで呟いていたら親切に教えてくれる人がいて、Ruby本体のthread.cというソースファイルにコメントとして書かれていることがわかった。

この機能に限らず、Rubyの拡張ライブラリを書いていてCからRubyを操作するAPIに悩んだ時には、ググるよりもまずソースを読んだ方が早いっぽい。ユーザに公開された関数のリストはruby/intern.hを見れば一覧になっていて、それっぽい名前の関数を見つけたら該当のソースに当たればいい。というか名前とシグネチャからだいたいの仕様は類推できる。

There is actually no documentation about how to use GVL from C extensions.

Luckily, someone told me how to do it when I tweeted about it. It is written in Ruby's thread.c source file as comments.

It seems that reading the Ruby source code is a way easier to find information than googling. You should first check the list of public methods at ruby/intern.h , then jump to the detail of the method you are interested in, though you can guess most of the implementations by looking at names and signatures.

rb_thread_blocking_region

関数の名前からすると「スレッドをブロックしながら何かするコードの領域」なのかなと一瞬思ったりもするが、コメントにも「permit concurrent/parallel execution」に書いてあることから、たぶん「スレッドをブロックさせてしまうようなコードはこの領域の中で実行してね」という意味なんだと思う。シグネチャがちょっと複雑で、以下のようになっている。

The method name implies that "it is a area to do something while blocking thread", but that's not correct. As the comment says "permit concurrent/parallel execution", it probably means that "execute any thread blocking operations here". The signature is a bit complicating like this.

VALUE

rb_thread_blocking_region(

rb_blocking_function_t *func, void *data1,

rb_unblock_function_t *ubf, void *data2);

GVLを外して実行したい関数へのポインタをfuncに指定し、その関数に渡したいデータのポインタをdata1に指定する。ubfとdata2はよくわからないのだが、とりあえずRUBY_UBF_IOとNULLを渡してあげればいいっぽい。コメントに「In short, this API is difficult to use safely.」とあるのでビビるわけだが、簡単なユースケースなら何とかなりそうだ。

Specify the pointer of the function you want to execute without GVL to "func", and the pointer of the data you want to pass to the function to "data1". I wasn't quite sure what "ubf" and "data2", but it worked fine when I specified RUBY_UFB_IO and NULL. The source comment says "In short, this API is difficult to use safely", so I was a bit worried, but looks like I can use it for simple use cases.

抽象化

で、KCの全てのAPIをrb_thread_blocking_regionの中で実行するようにバインディングのコードを書き換えまくることになるのだが、この関数のようないわばエキスパート向けの機能を直接使うのは保守性が悪すぎるので、一段抽象化をかまそう。

Abstractions

To use this method, I have to change codes at binding to be executed inside the rb_thread_blocking_region function. I abstracted this for portability purpose like below.

class NativeFunction {

public:

virtual void operate() = 0;

static void execute(NativeFunction* func) {

#if defined(_KC_YARV_)

rb_thread_blocking_region(execute_impl, func, RUBY_UBF_IO, NULL);

#else

func->operate();

#endif

}

private:

static VALUE execute_impl(void* ptr) {

NativeFunction* func = (NativeFunction*)ptr;

func->operate();

return Qnil;

}

};

NativeFunctionというクラスのopereteメソッドを実装したファンクタ的なクラスのインスタンスを作ってから、それをNativeFunctionのクラスメソッドであるexecuteに渡すと、rb_thread_blocking_regionの中で実行してくれるという算段である。1.9（YARV）だけでなく1.8でも一応は動くように、executeの中ではrb_thread_blocking_regionの中でなく単純にoperateを呼ぶ縮退運転も用意している。使い方は以下のようになる。

This will let you execute the code inside rb_thread_blocking_region by passing a class called NativeFunction which has "operate" method and works like functor.

There is also "operate" method which does not use rb_thread_blocking_region so that this work with both Ruby 1.8 and 1.9

The below is how to use this NativeFunction

bool db_remove_record(DB* db, const char* kbuf, size_t ksiz) {

class FuncImpl : public NativeFunction {

public:

FuncImpl(kc::PolyDB* db, const char* kbuf, size_t ksiz) :

db_(db), kbuf_(kbuf), ksiz_(ksiz), rv_(false) {}

bool rv() {

return rv_;

}

private:

void operate() {

rv_ = db_->remove(kbuf_, ksiz_);

}

kc::PolyDB* db_;

const char* kbuf_;

size_t ksiz_;

bool rv_;

} func(db, kbuf, ksiz);

NativeFunction::execute(&func);

return func.rv();

}

性能評価

上記の方法でひたすら既存コードを書き換えて、一通りの機能を実装した。で、Core2 DuoのマシンのRuby 1.9.1上でベンチマークテストを行ってみた。4スレッドで100万件のレコードの操作を行ったところ、以下の結果となった。

Benchmarking result.

I implemented all the functionalities and did performance test. The conditions are as follows.

* Machine = Core2 Duo

* Ruby version = 1.9.1

* Number of threads = 4

* Number of records = 1 million

	set	get	remove
Serial	6.17	5.71	5.60
Parallel	6.73	5.85	6.74

うーむ。全く早くなっていないというか、むしろ遅くなっている。確たる理由はわからないが、おそらく100万レコード程度だとKCは全くボトルネックにならず、並列処理部分の儲けより並列化のためのオーバーヘッドによる損の方が大きいからだと思う。

Ahhh, the parallel version is slower than serialised version. I don't know the exact reason, but it is more likely that inserting 1 million is not enough to stress test KC. It is rather slowing down due to additional overhead of creating many threads.

これは結構悲しいことだ。DBがファイルシステムのキャッシュに乗る規模だとDB層が速すぎてRuby層がボトルネックになるからDB層の並列化の恩恵は受けられない。一方で、DBがファイルシステムのキャッシュに乗らない規模だと、DB層がボトルネックになるから並列化の恩恵が受けられるかと思いきや、その場合にはほぼ直列処理しかできないHDDに律速されるために結局並列化はあまり意味が無いということになってしまう。ただ、これは現状のHDDをストレージにした場合で、並列処理が得意なSSDを使えば並列化の恩恵は受けられるということになる。

This is quite sad. when data is small enough to put everything in cache, DB operates too fast to be the bottleneck and you can see the overhead at Ruby level. You might think it gets faster when data is too big to put into cache, but that's not the case either, because HDD has only single IO and can't perform in parallel. This means that you can only benefit from KC's parallelisation if you use SSD.

ということで、「直列処理しかできないRubyにおいてDB層の処理だけ並列させる」というソリューションは、現時点での多くのユーザの環境ではそれほど意味が無いかもしれないが、下層のハードウェアにそこそこよい物を使っている人には多分嬉しいのではないかということにしておく。実際のところどうなのかは、そこそこちゃんとしたSSDの環境が手に入ってから追って検証する所存。

To sum up, you can not benefit a lot from parallelising DB layer when you can only do serial operation with Ruby and this is the case for most users, though it could be beneficial for some people who have better hardware (i.e.. SSD). I need to investigate more once I get proper SSD environment.

並列モードの使い方と注意点

デフォルトは直列モードである。つまり、KCのネイティブAPIの呼び出しはrb_thread_blocking_region経由でなく直接実行するようにしている。そうすることで、スレッドを使っていない多くのユースケースでオーバーヘッドを抑えられる。並列モードにしたい場合には、DB::newの引数としてtrueを与えればよい。

How to use parallel mode and some gotchas

Ruby binding uses serialise mode by default. This means that KC native API is called directly, not via rb_thread_blocking_region. This will avoid some overhead when you do not use threads (which is majority of the cases).

If you want to use in parallel mode, you have to add "true" as argument at DB::new

db1 = DB::new # Serial mode

db2 = DB::new(true) # Parallel mode

new以外の使い方は全く同じでOKである。ただし、DB#acceptとDB#iterateとDB#eachとCursor#acceptは並列モードにすると利用できなくなってしまう。それらの共通点は、RubyコードをKCのネイティブAPIの中からコールバックするということだ。なぜRubyコードを呼び出せないかというと、GVLを外した状態で実行するコードではRuby VMを決して操作してはならないことになっているからだ。まあ、acceptがなくてもcasだけあれば何でもできるさ。

Other than this, it's mostly the same how you use the binding. One thing you have to be aware that certain commands (DB#accept, DB#iterate, DB#each, and Cursor#accept) can not be used with parallel mode. This is because all of these require callback from KC native API to Ruby code. Since you can not operate Ruby VM while you get rid of GVL, this is not acceptable operation. Having said that, the use of these methods can be replaced by using "cas" method.

まとめ

Kyoto CabinetのRubyバインディングの最新版からは、並列モードが使えるようになった。普通は直列処理（というか、並列でない並行処理）しかできないRubyプログラムであるが、DBなどのIO系のネイティブな処理は並列化して実行できる。KCがボトルネックになるような大規模なDBを構築してかつSSDのような並列処理性能の高いデバイスを用いている場合には、並列モードがきっと役立つと思う。そんなこんなで、ROMAとかで使ってくれないかなぁとここで呟いてみる。

Summary

You can use parallel mode via KC Ruby binding. In majority of the cases, you can only do serial operation (i.e.: concurrent operation rather than parallel operation), but you can parallelise native IO operations when accessing DB. If you create very large scale database where KC could become bottleneck, then you may be able to achieve hight throughput with Ruby's parallel mode and some highly parallelised devices (such as SSD). I am kind of hinting that ROMA(http://github.com/roma/roma/) could be the perfect use case. I am just saying…

75_MakingKyotoCabinetRubyBindingToWorkInParallel

Making Kyoto Cabinet Ruby binding work in parallel

Ruby 1.9のスレッドモデル

Threading model in Ruby 1.9

抽象化

Abstractions

性能評価

並列モードの使い方と注意点

How to use parallel mode and some gotchas

まとめ

Summary

75_MakingKyotoCabinetRubyBindingToWorkInParallel

Page Tools

Insert links

Comments (0)

Join this workspace

Navigator

SideBar

Recent Activity