pr00f Posted March 5, 2015 Report Posted March 5, 2015 (edited) This post does not require you to click the Likes button to read this content.http://a.pomf.se/usqyao.png"""tocmai.ro scraperGets name, city, phone no.http://a.pomf.se/usqyao.png"""import reimport jsonimport requestsfrom bs4 import BeautifulSoup as bpages = 1catURL = "http://www.tocmai.ro/anunturi/electronice-si-electrocasnice/"ajaxNum = "http://www.tocmai.ro/ajax_ad/call/%s/1/%s/"def getName(link): # Gets the name from the ad soup = b(requests.get(link).text) name = soup.find("a", attrs={"class": "name"}).text return namedef getLoc(link): # Gets the city soup = b(requests.get(link).text) loc = soup.find("small", attrs={"itemprop": "itemLocalitate"}).text return locdef getPhoneNum(link, aID): # Gets the phone number soup = b(requests.get(link).text) try: pHash = re.search("Ad\.phone\.show.*'(.+)'", str(soup)).group(1) except AttributeError: return None else: resp = requests.get(ajaxNum % (aID, pHash)).text num = json.loads(resp).get("img") return numdef main(): for pageNum in range(0, pages): print("Page %d\n" % (pageNum+1,)) page = requests.get(catURL + "incepedela-" + str(pageNum*20)) soup = b(page.text) links = soup.findAll("a", attrs={"class": "record_title"}) for item in links: url = item['href'] aID = re.search(".*-(\d+)\.html", url).group(1) print("%s" % aID) print("\tName: %s" % (getName(url),)) print("\tCity: %s" % (getLoc(url),)) if getPhoneNum(url, aID) != None: print("\tPhone: %s" % getPhoneNum(url, aID))if __name__ == "__main__": main()OLX scraper: https://rstforums.com/forum/97868-olx-ro-scraper-nume-nr-telefon-adrese-yahoo-skype.rst Edited March 6, 2015 by pr00f 1 1 Quote
QUADMACHINE Posted March 5, 2015 Report Posted March 5, 2015 Salut, este scris in limbajul Python? Quote
TheOne Posted March 6, 2015 Report Posted March 6, 2015 Salut, este scris in limbajul Python?Se vede clar in imagine ca e scris in Python. "randon -> python tocmai-scraper.py" Quote
QUADMACHINE Posted March 6, 2015 Report Posted March 6, 2015 Se vede clar in imagine ca e scris in Python. "randon -> python tocmai-scraper.py"Incercam sa nu am post inutil, chiar crezi ca sunt atat de prost incat sa nu vad in ce limbaj este.Deobicei postez aiurea si ma trezesc in galaxia IX din constelatia Orion pe planeta CenturionShef. Quote