multicore {multicore}R Documentation

multicore R package for parallel processing of R code

Description

multicore is an R package that provides functions for parallel execution of R code on machines with multiple cores or CPUs. Unlike other parallel processing methods all jobs share the full state of R when spawned, so no data or code needs to be initialized. The actual spawning is very fast as well since no new R instance needs to be started.

Pivotal functions

mclapply - parallelized version of lapply

pvec - parallelization of vectorized functions

parallel and collect - functions to evaluate R expressions in parallel and collect the results.

Low-level functions

Those function should be used only by experienced users understanding the interaction of the master (parent) process and the child processes (jobs) as well as the system-level mechanics involved.

See fork help page for the principles of forking parallel processes and system-level functions, children and sendMaster help pages for management and communication between the parent and child processes.

Classes

multicore defines a few informal (S3) classes:

process is a list with a named entry pid containing the process ID.

childProcess is a subclass of process representing a child process of the current R process. A child process is a special process that can send messages to the parent process. The list may contain additional entries for IPC (more precisely file descriptors), however those are considered internal.

masterProcess is a subclass of process representing a handle that is passed to a child process by fork.

parallelJob is a subclass of childProcess representing a child process created using the parallel function. It may (optionally) contain a name entry – a character vector of the length one as the name of the job.

Options

By default functions that spawn jobs across cores use the "cores" option (see options) to determine how many cores (or CPUs) will be used (unless specified directly). If this option is not set, multicore uses by default as many cores as there are available. (Note: cores in this document refer to virtual cores. Modern CPUs can have more virutal cores than physical cores to accommodate simultaneous multithreading. For example, a machine with two quad-core Xeon W5590 processors has combined eight physical cores but 16 virtual cores. Also note that it is often beneficial to schedule more tasks than cores.)

The number of available cores is determined on startup using the (non-exported) detectCores() function. It should work on most commonly used unix systems (Mac OS X, Linux, Solaris and IRIX), but there is no standard way of determining the number of cores, so please contact me (with sessionInfo() output and the test) if you have tests for other platforms. If in doubt, use multicore:::detectCores(all.tests=TRUE) to see whether your platform is covered by one of the already existing tests. If multicore cannot determine the number of cores (the above returns NA), it will default to 8 (which should be fine for most modern desktop systems).

Warning

multicore uses the fork system call to spawn a copy of the current process which performs the compultations in parallel. Modern operating systems use copy-on-write approach which makes this so appealing for parallel computation since only objects modified during the computation will be actually copied and all other memory is directly shared.

However, the copy shares everything including any user interface elements. This can cause havoc since let's say one window now suddenly belongs to two processes. Therefore multicore should be preferrably used in console R and code executed in parallel may never use GUIs or on-screen devices.

An (experimental) way to avoid some such problems in some GUI environments (those using pipes or sockets) is to use multicore:::closeAll() in each child process immediately after it is spawned.

Author(s)

Simon Urbanek

See Also

parallel, mclapply, fork, sendMaster, children and signals


[Package multicore version 0.1-8 Index]