Old Skool Unix part 1 - cut
You do know the most used key combination ever, right? Ctrl-C Ctrl-V. Also known as cut and paste.
But did you know these are actually separate utilities in the UNIX ecosystem?
Here’s a quick guide how to use cut
, a utility to cut
out certain parts from some input (and since we’re using
UNIX, that input can be a file as well as the keyboard).
Let’s make a quick demo file, maybe something from a log? And we want a list of all the paths that got accessed (the URI)?
101.198.0.156 - - [16/Jul/2025:05:39:16 +0200] "GET /sitemap.xml HTTP/1.1" 200 4459 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11"
101.198.0.156 - - [16/Jul/2025:05:39:46 +0200] "GET /config.json HTTP/1.1" 404 3823 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11"
5.255.231.115 - - [16/Jul/2025:05:41:10 +0200] "GET /robots.txt HTTP/1.1" 404 3910 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
5.255.231.170 - - [16/Jul/2025:05:41:11 +0200] "GET /posts/annoying-poetry-bug/ HTTP/1.1" 200 6386 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
104.210.140.129 - - [16/Jul/2025:05:53:34 +0200] "GET /robots.txt HTTP/1.1" 404 3838 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot"
52.21.62.139 - - [16/Jul/2025:08:34:12 +0200] "GET /tags/coding/ HTTP/1.1" 200 5380 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) Chrome/119.0.6045.214 Safari/537.36"
66.249.74.162 - - [16/Jul/2025:10:15:49 +0200] "GET /robots.txt HTTP/1.1" 404 3879 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.74.164 - - [16/Jul/2025:10:15:50 +0200] "GET /tags/ HTTP/1.1" 304 3709 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like
Now this looks quite messy. Yes, you could see it optically, but for starters these lines are very long and will probably wrap around on your screen. I even cut the last one in half, but that was a bit unintentional…
But, to just get the path you have to prepare two things:
- a character to place the cuts
- the number(s) of the field(s) you want to cut out. Yes, you can take more than one.
Let’s do this:
$ cut -d'"' -f2 <logfile
GET /sitemap.xml HTTP/1.1
GET /config.json HTTP/1.1
GET /robots.txt HTTP/1.1
GET /posts/annoying-poetry-bug/ HTTP/1.1
GET /robots.txt HTTP/1.1
GET /tags/coding/ HTTP/1.1
GET /robots.txt HTTP/1.1
GET /tags/ HTTP/1.1
Ok, but we just wanted the paths, right? Well, nobody says we can only cut once, let’s just do it twice!
$ cut -d'"' -f2 <logfile | cut -d' ' -f2
/sitemap.xml
/config.json
/robots.txt
/posts/annoying-poetry-bug/
/robots.txt
/tags/coding/
/robots.txt
/tags/
Perfect, just what we wanted, and without starting a complete
programming language like awk
, or a complex regular expression
parser like grep
.
Next I’ll show you how to do statistics on those paths, by using
three other nice standard UNIX utilities, wc
, sort
and uniq
.