Category Archives: Uncategorized

Sys Army Knife – What’s in list x but not list y?

Sys Army KnifeIt’s time once again to pull out your sys army knife and explore how to best use some of the tools available to system administrators out there! These “sys army knife” posts explore how to use common Linux/Unix command line tools to accomplish tasks that system administrators may encounter day-to-day.

I’m regularly involved in large-scale data center migration projects, so I quite commonly have to look at two different lists of things and figure out which entries are unique to each list.

For instance, I might have a list of machines that we’re planning to migrate. If someone gives me an updated list of machines in the data center. I have to figure out if there are machines we don’t have to migrate after all, or if there are new machines we have to plan for.

Sysadmins do it with one line

If each of your lists contains only unique values, this task can be done with a simple one liner, like this:

cat file1 file2 file2 | sort | uniq -u

For example, let’s say that I have two lists. The first list is in a file named x and looks like this:

appserver01
appserver02
dbserver01
webserver01
webserver02
webserver03

The second list is in a file named y and looks like this:

appserver02
appserver03
dbserver01
webserver01
webserver03
webserver04

This shows the values unique to x:

$ cat x y y | sort | uniq -u
appserver01
webserver02

… and this shows the lines unique to y:

$ cat y x x | sort | uniq -u
appserver03
webserver04

How does it work?!?

What the commands above do is this: They take one copy of one file, two copies of a second file, sort the results, and then only print out lines that occur a single time.

You start with one copy of the first file, which means you have one copy of every line in that file. Then you add two copies of the second file. This means that you will have three copies of any line that is in both files, and two copies of any line that only occurs in the second file, but you’ll still only have one copy of any line that only exists in the first file. Thus, if you search for lines that only occur once in the final results, you’ll only find lines that are unique to the second file.

Here’s a little more detail:

The first part of the command (cat file1 file2 file2) concatenates together one copy of file1 and two copies of file2 and spits that out.

We then take the output of that cat command and pipe (‘|’) it to the sort command, which will produce a sorted copy of the data it receives. We need to do this because the next command we use expects its input to be sorted, and won’t produce correct results if the input it receives isn’t sorted.

Finally, we pipe the sort output to the ‘uniq’ command. The ‘-u’ option to the uniq command tells it to only print unique lines (i.e., lines that only exist once).

There can be only one…

You may encounter situations where the contents of your lists have duplicate values. If you have no duplicate values in file1, but duplicate values in file2, the command chain will still work as expected. However, if you have duplicate values in file1, all of those values will be ignored even if they only exist in file1. This is because the ‘uniq -u’ command looks for lines in the file that only exist once in either file.

The quick and easy way around this is to simply create a copy of file1 that removes any duplicates before starting:

sort -u file1 > file1.nodupes

Then use that file without the duplicates in the command chain:

cat file1.nodupes file2 file2 | sort | uniq -u

The beauty of it all

This may seem like an esoteric problem that you’re not likely to encounter very often, but you might be surprised how often this problem comes up. Here are just a few examples off the top of my head:

  • Find files that are unique between two servers
  • Find installed packages that are unique between two servers
  • Using old and new server lists figure out which servers are gone and which are new

These commands are all very simple standard commands that exist on pretty much any Unix or Linux system out there: I started using these commands way back in the late 80’s on a MicroVax II running ULTRIX and have since used them on multiple versions of AIX, BSD, HP-UX, IRIX, Linux, and SunOS/Solaris.

Biting off more than one can chew…

The astute reader may realize that it’s been several weeks since my last post, and that my detailed review of the rewritten Anaconda installer in Fedora 18 has not been going anywhere particularly fast. The even more astute reader may realize that Fedora 19 was released today, making the completion of that review something of a moot point.

A couple things I’ve learned so far in my foray into blogging:

Good, detailed blog entries take time.

If you think that you can write a quick blog entry on a complicated subject, think again, especially if you’re a detail oriented person like I am.

I initially took about two pages of notes and twenty screen shots of my Fedora 18 install experience. I figured I could churn out a one or two part blog post based on that in a few hours. After spending something like eight to ten hours on the first three parts of the review I realized that it would probably take me at least another three parts and a roughly equal amount of time to complete the review. This kind of put a damper on my enthusiasm.

Real life has a tendency to interfere with blogging.

If you have kids, they tend to have a lot of activities lumped together at the end of the school year. Then summer means they’re home all the time, and if you work from home a lot that creates its own problems (“Dad! Can you help me with…”). And, of course, if you change jobs and suddenly find yourself on the road about half the time working 10-12+ hour days, blogging suddenly drops way down on the priority list.

I’ll be taking small bites.

Going forward, in the hopes of actually getting back on track with my original goal of posting something once a week, I’ll be taking small bites. Perhaps I’ll share a hint about a favorite Unix/Linux command line trick, or briefly talk about some new toy, like the Ouya that I just got.

… so it begins.

On December 21, HostGator had a tongue-in-cheek “End of the World” sale, celebrating the day when so many had claimed the Mayans thought the world would end. I’d been meaning to grab the pdwaterman.com domain for a while to use for blogging and otherwise tooting my own horn providing information about myself.

Well, now that more than a month has gone by, I’ve finally decided that it’s time to start putting up some entries. For now, I’ll simply be using this domain to do some simple blogging. Most of it will probably be technical in nature. Since I’m a Linux geek, the majority of it will be Linux related, although I may occasionally divert into things like the Asterix PBX, Perl, XBMC, etc.

My goal right now is to get a new blog entry up on a weekly basis. We’ll see how well that goes. 🙂