Wx0xo6FsZRyx4rLE66hBR56d1ftvUDQRSK2eJM5q
Bookmark

Advanced Linux Tips: Moving Beyond the Basics The Hard Way

The day I realized I didn't actually know Linux

So there I was, staring at a production server that had just ground to a complete halt. It was about 2:00 AM—because naturally, servers only break when you're deeply asleep—and I was panicking. I had been using Linux for maybe three years at that point. I could navigate directories, I knew how to use grep without looking up the man page every time, and I felt pretty confident.

But this time, top showed the CPU idling at 99% wait time, the memory was free, and yet nothing was writing to the disk. I tried to restart the service, and it just hung. I tried to kill the process, and it ignored me. That night taught me a painful lesson: there is a massive canyon between "knowing how to use the terminal" and "understanding how the OS actually thinks." I spent four hours debugging that issue only to realize it was a stale NFS mount locking up the I/O. If I had known about D state processes back then, I would have been back in bed in ten minutes.

That's the gap I want to help you cross today. We're going to look at the stuff that usually only comes up when things are on fire.

1. The "Disk Full" Lie (Understanding Inodes)

Here's a scenario that drove me crazy a few years back. I got a frantic call that a web server couldn't write session files. I logged in, ran df -h, and saw this:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       50G   20G   30G   40% /

The disk was barely half full. I thought the monitoring system was glitching. But every time I tried to touch a new file, I got a "No space left on device" error. I honestly thought the hard drive controller had failed.

The problem wasn't space; it was structure. Every file on a Linux filesystem (mostly ext4 or xfs) needs an inode—a data structure that stores metadata like permissions and ownership. You have a finite number of these.

If you run an application that generates millions of tiny files (like a PHP session cache that isn't garbage collecting properly, or a mail spool), you run out of inodes before you run out of gigabytes. The command you actually need is df -i.

The Fix: When I ran df -i, usage was at 100%. I had to find the directory hoarding them. This command became my best friend:

for i in /*; do echo $i; find $i | wc -l; done

It's slow, but it works. These days, on modern distros like Ubuntu 22.04, I usually format partitions with the -T largefile4 option if I know I'm storing big data, or small if it's a mail server, to adjust the inode ratio. But checking inodes should always be your second step after checking space.

2. Stop trusting `top`: Use `strace`

Look, top and htop are great for a quick glance. I use htop daily. But they only tell you what is happening (high CPU), not why. If a process is stuck, top is useless.

I remember debugging a Python script that would just freeze randomly after three days of uptime. No logs, no errors, just silence. I attached strace to the running process ID (PID), and the mystery unraveled in seconds.

strace intercepts system calls. It shows you exactly what the program is asking the kernel to do. To use it on a running process:

sudo strace -p 12345

You'll see a wall of text. Don't panic. You're looking for patterns. In my Python case, I saw endless repeated calls to connect() trying to hit an external API that had changed its firewall rules. The script didn't have a timeout set, so it just waited forever.

Pro Tip: The output is overwhelming. Filter it. If you suspect file issues, use -e trace=open,read,write. If it's network, use -e trace=network. It turns a black box into a glass box.

3. The Immutable Bit (When `root` can't delete)

This one is technically a security feature, but it's also a great way to prank your junior sysadmin (don't actually do that). Sometimes, even as the root user, you cannot delete or modify a file. You get "Operation not permitted."

You check permissions: -rwxrwxrwx. You check ownership. You check ACLs. Nothing makes sense. I ran into this when cleaning up a hacked server. The attacker had placed a backdoor and locked it down so standard cleanup scripts couldn't touch it.

They used file attributes. Specifically, the immutable bit.

You can see these with lsattr. If you see an i in the string, that file is untouchable until you remove the attribute.

chattr -i malicious_file.php

I now use this on my critical configuration files. If I have a /etc/resolv.conf that NetworkManager keeps trying to overwrite (which is annoying as heck), I just chattr +i it. It's a blunt force instrument, but sometimes you just need the OS to leave your files alone.

4. Networking: `ss` is the new `netstat`

I resisted this change for years. I had netstat -tulpn muscle memory burned into my fingers since 2008. But netstat is deprecated, and honestly, ss (socket statistics) is faster and gives more info.

When you're dealing with a high-traffic server with thousands of connections, netstat can actually slow things down because it tries to read from /proc which is slow. ss talks directly to the kernel via netlink.

The command I use 90% of the time is:

ss -tulpn
  • t: TCP
  • u: UDP
  • l: Listening sockets
  • p: Show the process using the socket (requires sudo)
  • n: Numeric (don't try to resolve DNS, which is slow)

Another specific scenario: debugging a database connection pool. You want to see all established connections to PostgreSQL (port 5432). With ss, you can do this filter:

ss -o state established '( dport = :5432 or sport = :5432 )'

It allows for complex logic that netstat simply couldn't handle without piping to grep and awk.

5. Bash Safety Rails: Stop writing dangerous scripts

We've all heard the horror stories of rm -rf $VAR/* where $VAR turned out to be empty, resulting in rm -rf /*. I wiped a development mount doing exactly this about five years ago. I felt sick to my stomach.

Since then, I never write a Bash script longer than 5 lines without what I call the "safety rails." Adding this one line to the top of your script changes everything:

set -euo pipefail

Let's break down why this is non-negotiable for me now:

  • set -e: The script exits immediately if any command returns a non-zero exit code. No more cascading failures where the first command fails and the second command destroys data based on the failure.
  • set -u: Treats unset variables as an error. If I try to use $UNDEFINED_VAR, the script stops. This would have saved my development mount.
  • set -o pipefail: By default, if you do cmd1 | cmd2, the exit code is whatever cmd2 returns. If cmd1 fails but cmd2 succeeds, Bash thinks everything is fine. This option makes the whole pipe fail if any part fails.

Also, stop parsing the output of ls. Filenames can contain spaces, newlines, and other garbage. Use globs (*.txt) or find -print0 instead. It feels clunky at first, but it's the only way to write robust scripts.

6. Journalctl: It's not actually that bad

When Systemd first took over the major distros, I hated it. I missed text logs in /var/log. But I've come around, mostly because journalctl offers filtering that plain text files can't match without heavy regex.

The problem most people have is that running journalctl dumps the entire history of the universe onto your screen. You have to be specific.

Here is the command sequence I use when a service crashes:

  1. journalctl -u nginx --since "10 minutes ago" (Limits scope significantly)
  2. journalctl -f -u nginx (Follows the log in real-time, like tail -f)
  3. journalctl -p err (Shows ONLY errors, ignoring info/debug noise)

There's also a cool feature for checking previous boots. If your server crashed and rebooted, usually the logs are hard to correlate. journalctl --list-boots shows you the history, and journalctl -b -1 shows you the logs from the previous boot session. That is invaluable for diagnosing kernel panics or power failures.

FAQ: Advanced Linux Questions

Is it still worth learning Vim/Neovim or should I just use VS Code?

I use VS Code for big projects, but you absolutely need to know Vim basics. When you SSH into a broken server, you don't have VS Code. You have a terminal. You don't need to be a wizard with macros and registers, but you need to know how to edit a config file, search for a string (/), and save/quit (:wq). It's a survival skill, not a lifestyle choice.

Why does my memory usage show 90% even when nothing is running?

This is the most common "false alarm" in Linux. Linux hates wasted RAM. If your application isn't using it, the kernel uses it for disk caching to speed up file access. This is listed as "buff/cache" in free -h. If an app needs that RAM, the kernel instantly releases the cache. Don't panic unless the "available" column is low, or you see swap usage spiking.

Zsh vs Bash: Which one should I focus on?

For interactive use (your daily terminal), Zsh with the "Oh My Zsh" framework is superior. The auto-completion, themes, and plugins make life 10x better. However, for scripting, stick to Bash (or strictly POSIX sh). Bash is on every server by default; Zsh often isn't. Write your scripts for Bash so they are portable, but use Zsh to run them.

How do I practice these advanced concepts without breaking my laptop?

Don't run these experiments on your main OS. Use a virtual machine or a cheap VPS. Personally, I like using Docker containers for quick tests. You can spin up an Ubuntu container with docker run -it ubuntu bash, delete the /etc directory just to see what happens, and then kill the container. It's risk-free chaos.

So, where does this leave you?

The transition from "Linux user" to "Linux Admin" isn't about memorizing more flags. It's about changing how you react to errors. When I started, an error message was a stop sign. Now, it's just the first clue in a scavenger hunt.

You're going to break things. You're going to accidentally chown the wrong directory or lock yourself out of SSH (I still do this about once a year). The trick isn't to be perfect; it's to know which tool—whether it's strace, tcpdump, or just a really specific find command—will help you dig your way out of the hole. Go break something (safely), and fix it. That's the only way the knowledge sticks.

Dengarkan
Pilih Suara
1x
* Mengubah pengaturan akan membuat artikel dibacakan ulang dari awal.
Posting Komentar