I spend some day to learn a powerful tool that is called parallel. It can help you split several tasks into multiple groups to run them parallel. Get into the car!

  • parallel -k echo ::: A B C > abc-file

-k will keep order or it will output A B C arbitrality.

  • perl -e 'for(1..8){print "$_\n"}' > num8

This command is very useful because you will use it frequently later.

  • (echo %head1; echo %head2; perl -e 'for(1..10){print "$_\n"}') > num_%header

  • parallel echo ::: A B C

Same as:

  • parallel -a abc-file echo

  • parallel echo ::: A B C ::: D E F

Same as:

  • parallel -a abc-file -a def-file echo

  • parallel --xapply echo ::: A B C ::: D E F

Output (the order may be different):

  A D
  B E
  C F
  • ::: mean literal values, :::: mean file(same as -a)
  • parallel echo {} ::: A/B.C
  • parallel echo {.} ::: A/B.C

Output:

   A/B
  • parallel echo {/} ::: A/B.C

Output:

  B.C
  • parallel echo {//} ::: A/B.C

Output:

  A
  • parallel echo {/.} ::: A/B.C

Output:

  B
  • parallel echo {#} ::: A B C

Output (the order may be different):

  1
  2
  3
  • parallel -j 2 echo {\%} ::: A B C

Output:

  1
  2
  1

job slot number, \ could be removed(I add it cause my template syntax and so do whitespace)

  • Summary:
    • {} print out each element
    • {.} remove the extension
    • {/} remove the path(only keep file name)
    • {//} keep the path
    • {/.} remove path and extension
    • {#} job number
    • {\%} job slot number
    • {number} number is index param list, like {1} mean first param
  • parallel --jobs 4 -m echo ::: 1 2 3 4 5 6 7 8 9 10

Split to 4 tasks to run

Output:

 1 2 3
 4 5 6
 7 8 9
 10
  • parallel --jobs 4 -X echo pre-{}-post ::: A B C D E F G

Split to 4 tasks, but each has separate param

Output (the order may be different):

  pre-A-post pre-B-post
  pre-C-post pre-D-post
  pre-E-post pre-F-post
  pre-G-post
  • parallel -N3 echo ::: A B C D E F G H

Limit each task has 3 params

  • parallel -N3 echo 1={1} 2={2} 3={3} ::: A B C D E F G H
  • cat num30000 | parallel --xargs echo | wc -l
  • parallel --tmux

Use tmux for every job(use ctrl-b p to switch session)

  • parallel --eta
  • parallel --progress

Show progress of computations. kill -s USR2 to Parallel to toggle turning on/off –progress

  • parallel --joblog /tmp/log

The log file will contain job id, run time, exit code, command and so on.

  • parallel --resume-failed --joblog /tmp/log

Use --resume-failed will rerun failed task from the joblog, so you also can use parallel as your task executor that it can resume automatically.

  • parallel --memfree 1G

parallel will run job when memory large 1G.

Notice, parallel will kill youngest job if free memory below 50% of size. The killed job will be pushed back the queue.

  • parallel --pipe -N140000 --block 2M wc

This will split several group pass to wc, N and block to limit each group.

  • parallel --pipe --round-robin

If you don’t care order of records you can use round-robin

  • parallel -S $SERVER1 --transferfile --return {}.out --cleanup

This will transfer result file back to local from remote server and clean origin files.

  • parallel -S $SERVER1 --trc {}.out

As same as above.

Reference: