A few weeks ago, @wizardofzos tweeted about a unix shell script that showed a bug on z/OS.
Here is the script:
#!/bin/bash mkdir -p broken c=1 while [ $c -le 4000 ] do f=$RANDOM touch broken/$f setfacl -m "u:IBMUSER:rwx" broken/$f clear echo "Testing cut" | cut -c1 echo "Done for $c" let c=c+1 done
I was curious, so I tried to reproduce it on my system. The original script runs under bash. I tried it under both bash and /bin/sh but was unable to reproduce the bug, so I couldn’t do any further investigation.
What I did notice, however, was that the script was much, much faster under /bin/sh. That was interesting, so I had a closer look at the SMF data using EasySMF. I ran the 2 jobs for 1000 iterations of the loop. 1000 iterations created approximately 5000 unix tasks. Due to the various type 30 subtypes, the /bin/sh job produced about 20,000 type 30 records and the bash job about 28,000.
Here is part of the Job Completions report for the 2 jobs:
ANDREWRA is the job running bash, ANDREWRB runs /bin/sh.
The bash job took 2 minutes elapsed versus 1 minute for /bin/sh, but more interesting is the CPU time: more than 1 minute for bash, less than 5 seconds for /bin/sh. Those CPU times are the totals for all the descendants grouped under the collapsed top level jobs in the report.
The Unix Work report shows this in more detail. It works from the Step End SMF records and shows substep information as well. This is part of the report for ANDREWRA running bash:
We can see some interesting stuff here:
- Very little of the CPU time is charged back to the owning job JOB06732. Most of it is in OMVS sub tasks.
- The top level bash step uses the most CPU time.
- We can see what looks like the fork/exec pattern described in the SMF manual, where the parent forks and then execs another program creating a sub-step in SMF.
- Forking the bash task also seems to use a relatively large amount of CPU time.
- Although the Job Number STC06734 is the same for all these tasks, they are actually separate tasks reusing the same OMVS initiator.
The same report for ANDREWRB running /bin/sh:
Interesting differences:
- The top level shell uses a lot less CPU.
- It does not show the same sub-step pattern. The commands themselves exist as top level steps in the OMVS initiator. Overall it uses a lot less CPU.
Conclusions
This isn’t meant to be a criticism of the bash port. I imagine IBM has a lot more ability to get into the operating system internals and optimize /bin/sh. But it is interesting to see the difference in resource usage.
The obvious conclusion would be to avoid bash for shell scripts with significant loops, or (probably) scripts that spawn many subcommands e.g. find. Use bash if you really need it’s functionality, e.g. for login shells.
Much of my description of what is going on is speculative based on what I can see in SMF, if you know more please feel free to comment.