pr00f Posted March 5, 2015 Report Share Posted March 5, 2015 (edited) This post does not require you to click the Likes button to read this content.http://a.pomf.se/usqyao.png"""tocmai.ro scraperGets name, city, phone no.http://a.pomf.se/usqyao.png"""import reimport jsonimport requestsfrom bs4 import BeautifulSoup as bpages = 1catURL = "http://www.tocmai.ro/anunturi/electronice-si-electrocasnice/"ajaxNum = "http://www.tocmai.ro/ajax_ad/call/%s/1/%s/"def getName(link): # Gets the name from the ad soup = b(requests.get(link).text) name = soup.find("a", attrs={"class": "name"}).text return namedef getLoc(link): # Gets the city soup = b(requests.get(link).text) loc = soup.find("small", attrs={"itemprop": "itemLocalitate"}).text return locdef getPhoneNum(link, aID): # Gets the phone number soup = b(requests.get(link).text) try: pHash = re.search("Ad\.phone\.show.*'(.+)'", str(soup)).group(1) except AttributeError: return None else: resp = requests.get(ajaxNum % (aID, pHash)).text num = json.loads(resp).get("img") return numdef main(): for pageNum in range(0, pages): print("Page %d\n" % (pageNum+1,)) page = requests.get(catURL + "incepedela-" + str(pageNum*20)) soup = b(page.text) links = soup.findAll("a", attrs={"class": "record_title"}) for item in links: url = item['href'] aID = re.search(".*-(\d+)\.html", url).group(1) print("%s" % aID) print("\tName: %s" % (getName(url),)) print("\tCity: %s" % (getLoc(url),)) if getPhoneNum(url, aID) != None: print("\tPhone: %s" % getPhoneNum(url, aID))if __name__ == "__main__": main()OLX scraper: https://rstforums.com/forum/97868-olx-ro-scraper-nume-nr-telefon-adrese-yahoo-skype.rst Edited March 6, 2015 by pr00f 1 1 Quote Link to comment Share on other sites More sharing options...
QUADMACHINE Posted March 5, 2015 Report Share Posted March 5, 2015 Salut, este scris in limbajul Python? Quote Link to comment Share on other sites More sharing options...
TheOne Posted March 6, 2015 Report Share Posted March 6, 2015 Salut, este scris in limbajul Python?Se vede clar in imagine ca e scris in Python. "randon -> python tocmai-scraper.py" Quote Link to comment Share on other sites More sharing options...
QUADMACHINE Posted March 6, 2015 Report Share Posted March 6, 2015 Se vede clar in imagine ca e scris in Python. "randon -> python tocmai-scraper.py"Incercam sa nu am post inutil, chiar crezi ca sunt atat de prost incat sa nu vad in ce limbaj este.Deobicei postez aiurea si ma trezesc in galaxia IX din constelatia Orion pe planeta CenturionShef. Quote Link to comment Share on other sites More sharing options...