Jump to content
denjacker

How to use shell functions to fetch information online

Recommended Posts

Posted

How to use shell functions to fetch information online

Takeaway: Marco Fioretti shows two examples of shell functions that you can use for web scraping when all you need is a quick way to extract text from a given website.

Even in this age of touchscreen devices, many computing activities are much faster if you know the right tricks and stick to plain old typing. In my case, this applies to retrieving certain types of information from the Internet.

Most of my work consists of typing at a prompt or in applications that, like the Kate text editor or the Dolphin file manager, have an embedded terminal (and one of the reasons I prefer such applications is exactly that they make using certain tricks faster, no matter what else I am doing).

I save a not negligible amount of time when I’m doing system administration or just writing some text, thanks to shell functions like those that I’m going to present in a moment. Please note that none of these functions does anything difficult, or advanced. All they do is fetch some simple data from the Internet that I often need — in the fastest possible way — without forcing me to switch to another window. The reason why they are functions instead of autonomous scripts is that I also use them inside several scripts.

What are shell functions anyway?

n software programming, functions are blocks of code that perform one specific task, written in a way that can be easily reused and shared by many programs, possibly running every time with different input values.

Unix shells, that is the command interpreters that actually execute what we type at a prompt or save in a script, have functions just like compiled languages like C or C++. Shell functions can be called either at the prompt or from a script, and you only need to know a few things to start writing and using them:

  • Shell functions must be defined before you invoke them!
  • To have your functions always available at the prompt, you can save them in the $HOME/.bashrc file (or the equivalent one for non-Bash shells)
  • In Bash, the default shell on most Gnu/Linux distributions, functions can be defined in these two equivalent ways:

function my_bash_function { the function code goes here... }

my_bash_function () { the function code goes here... }

Weather forecast

A function I (have to) use more than it would be good for me, at least in certain periods, is the one that prints the weather forecast. Yes, I too think that looking out the closest window would be much simpler and smarter, but what when you’re in some conference or meeting room without windows? Here is how this function works:


[marco@polaris ~]$ weather

Weather for Rome

4°C Thu Fri Sat Sun

[sun] Clear [sun] [sun] [par] [sun]

Wind: S at 6 km/h

Humidity: 75% 10° 3° 12° 4° 13° 5° 11° 3°

[marco@polaris ~]$

And this is its code:

 weather ()

{

w3m -dump "http://www.google.com/search?hl=en&lr=&client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&q=weather+${1}&btnG=Search" >& /tmp/weather

grep -A 5 -m 1 "Weather for" /tmp/weather| cut -c28-

rm /tmp/weather

}

The function uses the w3m text-based browser to ask Google the weather forecast, save it in a temporary file and then extract from it, cutting unnecessary empty columns, the six lines starting from the one that contains the “Weather for” string. If invoked without arguments, this function will return the forecast for what Google thinks is your current location, but you may also specify other places, e.g. “weather “San Francisco”.

What’s general and great in this function is that it shows how easy it is to get started with Web scraping. This term indicates exactly what you have just seen at work in the example above: download the text version of some Web page, then cut and slice it to extract all and only the data you really need, all automatically. The functions that follow use the same technique to fetch another kind of information I often need.

Word Definition

What does that word exactly mean? When I’m in doubt, I ask my shell:

 [marco@polaris ~]$ define weird

weird (wîrd)

adj. weird·er, weird·est

1. Of, relating to, or suggestive of the

preternatural or supernatural.

2. Of a strikingly odd or unusual

character; strange.

3. Archaic Of or relating to fate or the

Fates.

n.

1.

a. Fate; destiny.

b. One's assigned lot or fortune,

especially when evil.

The answer, of course, comes from a function very similar to the one that provides weather forecasts:

define ()

{

w3m -dump http://www.thefreedictionary.com/weird >& /tmp/define_word

grep -A 15 ^Advertisement /tmp/define_word | cut -c20-60

rm /tmp/define_word

}

If you want to know why I start extracting text from the line that begins with “Advertisement”, type:

w3m -dump http://www.thefreedictionary.com/weird | more

at a prompt and look closely at the resulting text.

Doing that, you will also notice the biggest difference from the other function, besides the obvious fact that this goes to a different website. Since many words have definitions much longer than 15 lines, here I just estimate how many lines I should read to get enough information. Extracting the whole definition and nothing else, regardless of its length, would certainly be possible, but requires more advanced text parsing than I may show you in this space. Besides, doing it would not be worth the effort in this particular case, when all I want is to get a quick idea of what some word means.

Credits

The two functions above are my own, updated versions of those I originally fetched from David Crouse. Thanks, David! Web scraping is great, and doing it from shell functions makes it even more flexible.

http://www.techrepublic.com/blog/opensource/how-to-use-shell-functions-to-fetch-information-online/3317

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
×
  • Create New...