parallel
I spend some day to learn a powerful tool that is called parallel. It can help you split several tasks into multiple groups to run them parallel. Get into the car!
parallel -k echo ::: A B C > abc-file
-k will keep order or it will output A B C arbitrality.
perl -e 'for(1..8){print "$_\n"}' > num8
This command is very useful because you will use it frequently later.
-
(echo %head1; echo %head2; perl -e 'for(1..10){print "$_\n"}') > num_%header
-
parallel echo ::: A B C
Same as:
-
parallel -a abc-file echo
-
parallel echo ::: A B C ::: D E F
Same as:
-
parallel -a abc-file -a def-file echo
-
parallel --xapply echo ::: A B C ::: D E F
Output (the order may be different):
:::
mean literal values,::::
mean file(same as -a)parallel echo {} ::: A/B.C
parallel echo {.} ::: A/B.C
Output:
parallel echo {/} ::: A/B.C
Output:
parallel echo {//} ::: A/B.C
Output:
parallel echo {/.} ::: A/B.C
Output:
parallel echo {#} ::: A B C
Output (the order may be different):
parallel -j 2 echo {\%} ::: A B C
Output:
job slot number, \
could be removed(I add it cause my template syntax and so do whitespace)
- Summary:
{}
print out each element{.}
remove the extension{/}
remove the path(only keep file name){//}
keep the path{/.}
remove path and extension{#}
job number{\%}
job slot number{number}
number is index param list, like{1}
mean first param
parallel --jobs 4 -m echo ::: 1 2 3 4 5 6 7 8 9 10
Split to 4 tasks to run
Output:
parallel --jobs 4 -X echo pre-{}-post ::: A B C D E F G
Split to 4 tasks, but each has separate param
Output (the order may be different):
parallel -N3 echo ::: A B C D E F G H
Limit each task has 3 params
parallel -N3 echo 1={1} 2={2} 3={3} ::: A B C D E F G H
cat num30000 | parallel --xargs echo | wc -l
parallel --tmux
Use tmux for every job(use ctrl-b p to switch session)
parallel --eta
parallel --progress
Show progress of computations. kill -s USR2
to Parallel
to toggle turning on/off –progress
parallel --joblog /tmp/log
The log file will contain job id, run time, exit code, command and so on.
parallel --resume-failed --joblog /tmp/log
Use --resume-failed
will rerun failed task from the joblog,
so you also can use parallel
as your task executor that it can resume automatically.
parallel --memfree 1G
parallel
will run job when memory large 1G.
Notice,
parallel
will kill youngest job if free memory below 50% of size. The killed job will be pushed back the queue.
parallel --pipe -N140000 --block 2M wc
This will split several group pass to wc, N and block to limit each group.
parallel --pipe --round-robin
If you don’t care order of records you can use round-robin
parallel -S $SERVER1 --transferfile --return {}.out --cleanup
This will transfer result file back to local from remote server and clean origin files.
parallel -S $SERVER1 --trc {}.out
As same as above.
Reference: