Implementing the "wc" Unix command
Recently, I discovered my passion for software engineering, and have decided to pursue a career as a software engineer (SWE) when I complete my military service. As an aspiring SWE, I’m always looking for new ways to grow my knowledge and skills. I’ve discovered that I learn most by building real software, but it’s often hard for me to think of well-scoped projects.
Enter John Crickett’s coding challenges! He has posted dozens (and counting) of well-scoped projects that can usually be completed in ~8 hours. I hope to complete several of them over the next few months (years), starting with wc
. In this post, I’ll walk through my implementation of the unix/Unix command-line tool wc
.
About
wc
stands for word count. py-wc
is a minimal implementation of the Unix-style command-line tool named wc
, implemented in Python (hence the ‘py’ in py-wc
). As the name implies, its only use is for counting the number of lines, words, bytes, or characters in the files or directories specified in the input arguments.
I decided to code this up in Python since that’s the language I’m most familiar with. This allowed me to focus more on the workflow of wc
and less on the syntax of the language I was working with.
Instructions
To use this as a command-line tool, I recommend adding the finished script to PATH / system variables. For Windows, create a folder named Aliases
in your C drive: C:/Aliases
, and then add this folder to PATH. Next, create a batch file that will execute when you call the specified alias. For example, on my machine, I have a batch file named wc.bat
located at C:/Aliases
, that contains the following script:
1
2
3
@echo off
echo.
python C:\...\GitHub\py-wc\main.py %*
So now, when I type wc
in the command prompt, this batch file will execute, which in turn, runs the py-wc
Python script.
Examples
py-wc
allows you to execute typical Unix-style wc
commands.
Here we see the line count for a single file:
C:\> wc test.txt -l
7145 test.txt
7145 total
lines
Byte count:
C:\> wc test.txt -c
342185 test.txt
342185 total
bytes
Character count:
C:\> wc test.txt -m
339289 test.txt
339289 total
chars
And word count:
C:\> wc test.txt -w
58164 test.txt
58164 total
words
You can also mix and match flags:
C:\> wc test.txt -w -l
7145 58164 test.txt
7145 58164 total
lines words
C:\> wc test.txt -w -l -c -m
7145 58164 339289 342185 test.txt
7145 58164 339289 342185 total
lines words chars bytes
And the order in which you pass the flags does not matter:
C:\> wc -w -l test.txt
7145 58164 test.txt
7145 58164 total
lines words
C:\> wc -w -l -c -m test.txt
7145 58164 339289 342185 test.txt
7145 58164 339289 342185 total
lines words chars bytes
If you don’t pass any flags, you get lines, words, and bytes by default:
C:\> wc test.txt
7145 58164 342185 test.txt
7145 58164 342185 total
lines words bytes
You can also pass in more than one file:
C:\> wc test.txt test2.txt
7145 58164 342185 test.txt
26 136 814 test2.txt
7171 58300 342999 total
lines words bytes
Or, you can pass in a directory:
C:\> wc test_text
7145 58164 342185 test_text\test.txt
26 136 814 test_text\test2.txt
7171 58300 342999 total
lines words bytes
Or multiple directories:
C:\> wc test_text test_text2
44 121 1453 test_text\graph.py
21 47 568 test_text\node.py
7145 58164 342185 test_text\test.txt
26 136 814 test_text\test2.txt
7145 58164 342185 test_text2\test.txt
26 136 814 test_text2\test2.txt
14407 116768 688019 total
lines words bytes
Finally, you can specify file extensions to ignore:
C:\> wc test_text test_text2 -i .py
7145 58164 342185 test_text\test.txt
26 136 814 test_text\test2.txt
7145 58164 342185 test_text2\test.txt
26 136 814 test_text2\test2.txt
14342 116600 685998 total
lines words bytes
And you can even specify directory names to ignore! First we call wc
on a python project, without ignoring any file extensions or directories:
C:\> wc matching-algorithms -l
6 matching-algorithms\.gitignore
22 matching-algorithms\README.md
2 matching-algorithms\.git\COMMIT_EDITMSG
16 matching-algorithms\.git\config
1 matching-algorithms\.git\description
2 matching-algorithms\.git\FETCH_HEAD
2 matching-algorithms\.git\HEAD
7 matching-algorithms\.git\index
2 matching-algorithms\.git\ORIG_HEAD
16 matching-algorithms\.git\hooks\applypatch-msg.sample
25 matching-algorithms\.git\hooks\commit-msg.sample
175 matching-algorithms\.git\hooks\fsmonitor-watchman.sample
9 matching-algorithms\.git\hooks\post-update.sample
15 matching-algorithms\.git\hooks\pre-applypatch.sample
50 matching-algorithms\.git\hooks\pre-commit.sample
14 matching-algorithms\.git\hooks\pre-merge-commit.sample
54 matching-algorithms\.git\hooks\pre-push.sample
170 matching-algorithms\.git\hooks\pre-rebase.sample
25 matching-algorithms\.git\hooks\pre-receive.sample
43 matching-algorithms\.git\hooks\prepare-commit-msg.sample
79 matching-algorithms\.git\hooks\push-to-checkout.sample
129 matching-algorithms\.git\hooks\update.sample
7 matching-algorithms\.git\info\exclude
11 matching-algorithms\.git\logs\HEAD
11 matching-algorithms\.git\logs\refs\heads\main
33 matching-algorithms\.git\logs\refs\remotes\origin\HEAD
11 matching-algorithms\.git\logs\refs\remotes\origin\main
12 matching-algorithms\.git\objects\00\0d21fcd4cb2ca6e59ef2b2002cb1048d541f27
4 matching-algorithms\.git\objects\01\6ca9c5ca0ff63978f9922ad01a1825c485a6bd
1 matching-algorithms\.git\objects\03\337b5c7dbec27bf6f6e6924006a0c119807769
6 matching-algorithms\.git\objects\03\efc0b7f527818bb540c6c5584ba612ecc1f594
2 matching-algorithms\.git\objects\04\caa8e53017b4ab5a660ca8fdb2583eb5362ac2
5 matching-algorithms\.git\objects\05\a357455eee5604cf4df366252cfe9c8f2f135e
1 matching-algorithms\.git\objects\09\2bf3e7900f85f5f61af91578883c0a27fbd979
8 matching-algorithms\.git\objects\09\34ea9b62bd31b4bfd8c4299f0c99ba8eb7d074
4 matching-algorithms\.git\objects\0a\66ffa21b808f14e472b8923cc06ee17b6b0e30
3 matching-algorithms\.git\objects\0b\96665845a9fc038ece44f13d56851fdb3e9913
3 matching-algorithms\.git\objects\0c\c5762af493d1dc5d0e7cc6afba4c9a8cd59e1a
4 matching-algorithms\.git\objects\0f\772273d50d95af0e8228ac1e3991802ca133af
2 matching-algorithms\.git\objects\10\5cb69c8126a9821a9b60eefa545493bea7af69
8 matching-algorithms\.git\objects\14\09ea0477d18d1265ba2198dec5aba43311f919
6 matching-algorithms\.git\objects\14\9e179ddcef5f250a0d457cbbc8a8598a7ea56d
2 matching-algorithms\.git\objects\16\4dee95843a1040ed4e5786ce0bdb0404fce3c7
1 matching-algorithms\.git\objects\17\9beaa4916f3066fd5a1d118fea0c3981ec0377
2 matching-algorithms\.git\objects\18\9f66776d49cd6466fa0ecc9ac242f298498c0e
1 matching-algorithms\.git\objects\1a\81cd8eb500d40599b5a6af6c3771c1c209b7e8
9 matching-algorithms\.git\objects\24\69195205a89376a0e264940fcd4cbf92e776d3
2 matching-algorithms\.git\objects\2a\c8234a99788f02864a4e52865561f6a5c0cf66
4 matching-algorithms\.git\objects\2c\095aa199c98877d853e6c7418e251f021d2858
6 matching-algorithms\.git\objects\2f\977032ce0b7c592dd6dfc1578daf90465712f6
2 matching-algorithms\.git\objects\30\311bdd6eefa848215aeb374a1e81f0422ba4dc
1 matching-algorithms\.git\objects\35\3f3fbb423644c20310875a4a9b0f320b8472c8
2 matching-algorithms\.git\objects\35\5cfe627e10c353a0622033ec854480aa45a6a5
8 matching-algorithms\.git\objects\36\aa7c1a8250066e1f72602624784c381c7cfe47
12 matching-algorithms\.git\objects\38\7f1ecdc33689b9832facbc79d5a7b350781e9c
4 matching-algorithms\.git\objects\45\289f3fb33ed737ba32477315d2d5dfec5916f5
2 matching-algorithms\.git\objects\46\670337f355e2ce85e2b86b55929a2c0003fce4
4 matching-algorithms\.git\objects\46\b61ec0bbdca71363c33e9a2984ae81aea49d65
9 matching-algorithms\.git\objects\48\2adbc27ccbd61a72a42437f051a7c554697c0e
2 matching-algorithms\.git\objects\49\6f2da837dfc087e1780e769d45a049f9526a39
3 matching-algorithms\.git\objects\49\c1a8aca332004b01ba3b5311dc5df76bbc4412
9 matching-algorithms\.git\objects\4d\a730235cf72ec132fbc58211db69aeef361ddd
10 matching-algorithms\.git\objects\4f\a4ecf0807c318bed1a5b187a094f61d56a8997
9 matching-algorithms\.git\objects\55\893097ffaae27e8966686c0eba5f978fc30f69
2 matching-algorithms\.git\objects\55\e4fafb340aedb627ae6a83f3a47f38d8fd17f7
3 matching-algorithms\.git\objects\57\020d7a093ad291bbb127936d7ba333ca0f417d
1 matching-algorithms\.git\objects\57\ba1a9eb61aabf8a9f355736bf37e4fc35e8a38
1 matching-algorithms\.git\objects\5b\6f10feb2f6dccf6b6646de20786e5984f76ae5
2 matching-algorithms\.git\objects\5c\9f6a1231a64a60cda335aab6b56655817eeea2
2 matching-algorithms\.git\objects\5d\0a4763935dad000434d62731b6aa01ebb130d1
2 matching-algorithms\.git\objects\60\3b2d4ea63da2a1abc5ad988d0778b41ab4ee01
4 matching-algorithms\.git\objects\62\d256746da9d28392b4ba52a7165feb2f8dbeeb
11 matching-algorithms\.git\objects\65\11e9b9a71e21bd93ebd7b6c56f204cad5bf151
2 matching-algorithms\.git\objects\66\9366557b06d1c037613a6e846931ebbba4fd63
1 matching-algorithms\.git\objects\6f\9509c88bed7080d496fc5e1d87a9315e30549d
2 matching-algorithms\.git\objects\75\7c54dc0b4bf7edb1f1cdd89b0f441482274998
4 matching-algorithms\.git\objects\76\cb89c01d795dd41ccbaa0d1080a5c16807f67c
2 matching-algorithms\.git\objects\7b\004c01d76cc06d5d8248ab31cc049cc19f5383
2 matching-algorithms\.git\objects\7d\24640b77c38b84b12d081f519e08238784d52a
2 matching-algorithms\.git\objects\7d\83f4a503dc10bae5d19a6711a5e05398bff429
2 matching-algorithms\.git\objects\7e\76c1ad1732110f153d3828af39dbc6e0352aed
1 matching-algorithms\.git\objects\7f\de9be408eaf61e6479123dd648631e587a69e1
2 matching-algorithms\.git\objects\83\18423da9a4f73461bcddc93203ef361722acd3
12 matching-algorithms\.git\objects\83\f6d95476afa6fb88f8dbe5dc94ec5534897384
3 matching-algorithms\.git\objects\85\29717a14738aa61b0d1fbb5392e9a32a95c838
2 matching-algorithms\.git\objects\89\3fe505d9af10f04abc4a4a7c8111a3eeffbd58
1 matching-algorithms\.git\objects\8d\a5ebfafbf3ef7a44680d3aa40833fbfc1336c3
3 matching-algorithms\.git\objects\8e\5dff36b31240309255cda561b41384c1532753
2 matching-algorithms\.git\objects\8f\87bb0cd9eb91644f865c9b34e739cfd2e52e89
2 matching-algorithms\.git\objects\91\b5f4e717c244518bebe7dc700ed0ea8bb53057
4 matching-algorithms\.git\objects\91\cd93c59e08145be520068c1b2149168859f86c
14 matching-algorithms\.git\objects\92\220a0b842ef847474a5577920aae801b5a9bb8
2 matching-algorithms\.git\objects\9d\fb9af432fd7ada9c839d2f1f648457d620e609
2 matching-algorithms\.git\objects\9e\32406d8f0b61943382fc09028b23a7eac6ce24
2 matching-algorithms\.git\objects\a4\7b497c678a3c9e080567170c22e917b856d154
1 matching-algorithms\.git\objects\a7\9d0521b0ead0b83df67faacc89d86709b43ad6
1 matching-algorithms\.git\objects\a7\b4cdd90a69728f1b5d0f44048ea17fe784b65b
17 matching-algorithms\.git\objects\a9\f0bf8781c8154d7c2b64c1bcfe450388bc48b5
1 matching-algorithms\.git\objects\ab\2d599d0ab5cc83aeca856b9a878c19176b6475
5 matching-algorithms\.git\objects\ae\7ece480dd34cefe0be8fc1468b5b9b40b53b5d
1 matching-algorithms\.git\objects\b2\faac4308bdb86ee184d638fac8ebd933af4bf2
4 matching-algorithms\.git\objects\b6\45f27b781af23c4cb6ee610e8fa0396bea123a
1 matching-algorithms\.git\objects\b8\0c246e7886ac724e9501b5eea46502d70931e1
1 matching-algorithms\.git\objects\bd\037fcb6bffac4ce63c5804fefe38c9a9783e3d
5 matching-algorithms\.git\objects\be\52369b3a6d051d87dd31762b2eac876a0da74e
1 matching-algorithms\.git\objects\bf\4bf806a309ad94e07c5d5aeb79e3213691de99
1 matching-algorithms\.git\objects\c7\a8e9372af0bfd72531a1a51d109c4fcf6bb67f
3 matching-algorithms\.git\objects\c8\6e2e430e1a990dc06cf4064f40f843d8ee19f8
13 matching-algorithms\.git\objects\c8\e7f121be82ca1b2d82a0b3c9c78cc9538d39e2
2 matching-algorithms\.git\objects\cc\33fad8fe23fe6a6358ec67800697a7edf04691
2 matching-algorithms\.git\objects\cd\620f8d2a5e25bdf63571ec62c6343f8d8a9db3
7 matching-algorithms\.git\objects\d1\ea52f806e033a972ba604ea3d84894c5743104
2 matching-algorithms\.git\objects\d7\6b6bef3a7b9e11fe3ef54b363b500d1f7dacaf
2 matching-algorithms\.git\objects\dd\9107788044beeac219f388ec407b7ee7963ef4
1 matching-algorithms\.git\objects\dd\d90a8241e81b1c2680932fd13ff66456aab8c8
2 matching-algorithms\.git\objects\de\35ae3f7705048f79c5c311c8909deecb910a6f
1 matching-algorithms\.git\objects\df\e0770424b2a19faf507a501ebfc23be8f54e7b
12 matching-algorithms\.git\objects\df\f061105e5f4c646404e74c10005642fe66f230
3 matching-algorithms\.git\objects\e0\eef57f311949b04717e887a623319ef7140d59
1 matching-algorithms\.git\objects\e1\2b850fa2dff1081315b74ac30580d3e7c7bf9f
3 matching-algorithms\.git\objects\e2\6698cbd1eba7a04d6843344464358d67d39c13
1 matching-algorithms\.git\objects\e6\9de29bb2d1d6434b8b29ae775ad8c2e48c5391
2 matching-algorithms\.git\objects\e7\8c23e4f705f854e23b966f277620ddead47267
1 matching-algorithms\.git\objects\ec\554de06cb005d3482d5db863227be29ced8202
2 matching-algorithms\.git\objects\ec\f68c4d00d8ec5a65bdb18d8a48bcad1259ac8e
2 matching-algorithms\.git\objects\ed\4eca9f0ef03b54f99c8263800709addc1acbad
1 matching-algorithms\.git\objects\f0\068bae1b50f7b8ce62374819becac2c1ac395c
3 matching-algorithms\.git\objects\f4\5f4249e989b919e37e83252eb477b772e4dee0
2 matching-algorithms\.git\objects\f9\4a132a71a830d406fdd437614f12759bb3e825
5 matching-algorithms\.git\objects\fa\19b2bb66dde866581c81e85c2cf1f590252c26
1 matching-algorithms\.git\objects\fb\75d698e6ebc711e2e7e16123ec47c3198b7dd4
2 matching-algorithms\.git\objects\fd\6690d1380958c5f22a25bed4f5ca34bd20f68f
3 matching-algorithms\.git\objects\ff\75481a8bd2251e860820af9fe718d8ed198958
2 matching-algorithms\.git\refs\heads\main
2 matching-algorithms\.git\refs\remotes\origin\HEAD
2 matching-algorithms\.git\refs\remotes\origin\main
21 matching-algorithms\python\data_generator.py
44 matching-algorithms\python\graph.py
21 matching-algorithms\python\node.py
95 matching-algorithms\python\test_da.py
394 matching-algorithms\python\test_ttc.py
39 matching-algorithms\python\algos\da_utils.py
92 matching-algorithms\python\algos\deferred_acceptance.py
61 matching-algorithms\python\algos\top_trading_cycle.py
80 matching-algorithms\python\algos\ttc_utils.py
1 matching-algorithms\python\algos\__init__.py
15 matching-algorithms\python\algos\__pycache__\da_utils.cpython-311.pyc
30 matching-algorithms\python\algos\__pycache__\deferred_acceptance.cpython-311.pyc
34 matching-algorithms\python\algos\__pycache__\top_trading_cycle.cpython-311.pyc
13 matching-algorithms\python\algos\__pycache__\ttc_utils.cpython-311.pyc
2 matching-algorithms\python\algos\__pycache__\__init__.cpython-311.pyc
6 matching-algorithms\python\__pycache__\data_generator.cpython-311.pyc
31 matching-algorithms\python\__pycache__\graph.cpython-311.pyc
6 matching-algorithms\python\__pycache__\node.cpython-311.pyc
2316 total
lines
Yikes. Now, let’s call wc
on the same directory, but ignore any compiled python bytecode files (.pyc), .gitignore files, and any .git directories:
C:\> wc matching-algorithms -l -i .pyc .gitignore .git
22 matching-algorithms\README.md
21 matching-algorithms\python\data_generator.py
44 matching-algorithms\python\graph.py
21 matching-algorithms\python\node.py
95 matching-algorithms\python\test_da.py
394 matching-algorithms\python\test_ttc.py
39 matching-algorithms\python\algos\da_utils.py
92 matching-algorithms\python\algos\deferred_acceptance.py
61 matching-algorithms\python\algos\top_trading_cycle.py
80 matching-algorithms\python\algos\ttc_utils.py
1 matching-algorithms\python\algos\__init__.py
870 total
lines
Much better!
Libraries
Nothing fancy required - just some basic I/O with Python’s built-in open
function, command-line parsing with the argparse
library (which comes standard as part of the Python Standard Library), and some directory navigation with os
(also part of the Python Standard Library).
If you’ve never used argparse
, I highly recommend it for handling parsing of command-line inputs for programs, here’s an example from my repo for this project:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
if __name__ == '__main__':
# Create an ArgumentParser object
parser = argparse.ArgumentParser(description='Process text file(s).')
# Add arguments for input file(s) and/or dir(s)
parser.add_argument(
'input_files_or_dirs',
nargs='+',
type=str,
help='Path to the input file(s) and/or dir(s). Pass no options with input file to compute -l (line count), -w (word count), and -c (byte count)'
)
# Add flags for different options, store as True/False
parser.add_argument('-c', '--bytes', action='store_true', help='Count bytes')
parser.add_argument('-l', '--lines', action='store_true', help='Count lines')
parser.add_argument('-w', '--words', action='store_true', help='Count words')
parser.add_argument('-m', '--characters', action='store_true', help='Count characters')
# Add a flag to specify extensions to ignore, set default to empty list, allow multiple args to be passed
parser.add_argument('-i', '--ignore-extensions', default=[], nargs='+', help='List of file extensions to ignore')
# Parse the command-line arguments
args = parser.parse_args()
# Count how many optional flags were passed out of -c, -l, -w, and -m (ignore -i)
optional_flags = sum([args.bytes, args.lines, args.words, args.characters])
# more code below...
The full code for this project can be found at my Github here.
Acknowledgements
Thanks to John Crickett for the idea from his site, Coding Challenges!
Text samples were downloaded from this site.
If you happen to peruse my code and notice any bugs or opportunities for optimizations, please let me know!