OLX.ro Scraper (nume + nr. telefon + adrese yahoo/skype)

pr00f · February 25, 2015

This post requires you to click the Likes button to read this content.

http://a.pomf.se/pjmwvx.png

"""
OLX.ro scraper
Gets name, phone no., Yahoo! & Skype addresses, where applicable
http://a.pomf.se/pjmwvx.png
"""

import re
import json
import requests
from bs4 import BeautifulSoup as b

pages = 1  # How many pages should be scraped

# Category URL, a.k.a. where to get the ads from
catURL = "http://olx.ro/electronice-si-electrocasnice/laptop-calculator/"

# Links to the Ajax requests
ajaxNum = "http://olx.ro/ajax/misc/contact/phone/"
ajaxYah = "http://olx.ro/ajax/misc/contact/communicator/"
ajaxSky = "http://olx.ro/ajax/misc/contact/skype/"


def getName(link):
    # Get the name from the ad
    page = requests.get(link)
    soup = b(page.text)
    match = soup.find(attrs={"class": "block color-5 brkword xx-large"})
    name = re.search(">(.+)<", str(match)).group(1)
    return name


def getPhoneNum(aID):
    # Get the phone number
    resp = requests.get("%s%s/" % (ajaxNum, aID)).text
    try:
        resp = json.loads(resp).get("value")
    except ValueError:
        return  # No phone number
    if "span" in resp:  # Multiple phone numbers
        nums = b(resp).find_all(text=True)
        for num in nums:
            if num != " ":
                return num
    else:
        return resp


def getYahoo(aID):
    # Get the Yahoo! ID
    resp = requests.get("%s%s/" % (ajaxYah, aID)).text
    try:
        resp = json.loads(resp).get("value")
    except ValueError:
        return  # No Yahoo! ID
    else:
        return resp


def getSkype(aID):
    # Get the Skype ID
    resp = requests.get("%s%s/" % (ajaxSky, aID)).text
    try:
        resp = json.loads(resp).get("value")
    except ValueError:
        return  # No Skype ID
    else:
        return resp


def main():
    for pageNum in range(1, pages+1):
        print("Page %d." % pageNum)
        page = requests.get(catURL + "?page=" + str(pageNum))
        soup = b(page.text)

        links = soup.findAll(attrs={"class":
                                    "marginright5 link linkWithHash \
                                    detailsLink"})

        for a in links:
            aID = re.search('ID(.+)\.', a['href']).group(1)
            print("ID: %s" % aID)
            print("\tName: %s" % getName(a['href']))
            if getPhoneNum(aID) != None:
                print("\tPhone: %s" % getPhoneNum(aID))
            if getYahoo(aID) != None:
                print("\tYahoo: %s" % getYahoo(aID))
            if getSkype(aID) != None:
                print("\tSkype: %s" % getSkype(aID))

if __name__ == "__main__":
    main()

Tocmai scraper: https://rstforums.com/forum/98245-tocmai-ro-scraper-nume-oras-numar-telefon.rst

Edited March 6, 2015 by pr00f

R3load · February 26, 2015

Gogosari verzi? Awesome.

UnixDevel · February 26, 2015

ideea este ca e mai usor prin api-ul de la mobile ca acolo nu ai limite la scrape pe cand aici daca imi aduc aminte erau ceva limite stiu ca eu am altscrapper facut de mine pentru anunturi

florinul · February 28, 2015

am pus la numbe of page 500 dupa pagina 10 da erorare nu mai scoate nimic... e vreo protectie de la site?

WerBF · February 28, 2015

Sunt sigur ca da . Tu trimiti prea multe requesturi intr-un timp prea scurt .

zebra · July 6, 2015

e in c++ nu?

@pr00f

pr00f · July 6, 2015

e in c++ nu?
@pr00f

python

Edited July 6, 2015 by pr00f

Sir-Galahad · July 6, 2015

Phyton

mrgrj · July 6, 2015

e in c++ nu?
@pr00f

E python

Inainte sa pui intrebari de genul, intereseaza-te putin. Raspunsul dat de pr00f este ironic. Nu te apuca sa faci scrappere in Pascal.

zebra · July 6, 2015

@MrGrj poti sa faci tu unul te rog

Sir-Galahad · July 6, 2015

Vezi c? ai o mul?ime de topicuri unde ceri lucrul x sau y. Începe s? mai ?i oferi.

Vlachs · July 17, 2015

E python
Inainte sa pui intrebari de genul, intereseaza-te putin. Raspunsul dat de pr00f este ironic. Nu te apuca sa faci scrappere in Pascal.

Why not, pascal rullz

pr00f · July 18, 2015

import requests
ImportError: No module named requests

sudo apt-get install python-requests

sau din pip sau cu package managerul folosit de tine, sau cauta ba pe internet

grmrev · July 31, 2015

awesome share thank you

hades · July 31, 2015

Interesant.

Sunt unele mici chestii care s-ar putea imbunatatii d.p.d.v functional sau pep8 stuff.

Poate-mi fac timp weekendul asta si cu acordu' lu pr00f fac niste modificari pe el. Daca nu-mi dau grav si uit.

shuttershades · September 25, 2015

Class: PHP OLX Classifieds Scraper - PHP Classes

eugen9f · November 8, 2015

cum fac sa salvez rezultatele?

razvancentru · December 21, 2017

Cum salvez fisierul ?

razvancentru · February 24, 2018

Mai functioneaza ?

Georgee7 · May 18, 2018

Salut, daca am decat link-ul username-ului pe olx, si acest user nu are anunturi, dar a avut in trecut.. acum pot sa ii aflu nr de telefon ?

Sign In

OLX.ro Scraper (nume + nr. telefon + adrese yahoo/skype)

Recommended Posts

pr00f

R3load

UnixDevel

florinul

WerBF

zebra

pr00f

Sir-Galahad

mrgrj

zebra

Sir-Galahad

Vlachs

pr00f

grmrev

hades

shuttershades

eugen9f

razvancentru

razvancentru

Georgee7

Join the conversation

Browse

Activity

Pages