pr00f Posted February 25, 2015 Report Posted February 25, 2015 (edited) This post requires you to click the Likes button to read this content.http://a.pomf.se/pjmwvx.png"""OLX.ro scraperGets name, phone no., Yahoo! & Skype addresses, where applicablehttp://a.pomf.se/pjmwvx.png"""import reimport jsonimport requestsfrom bs4 import BeautifulSoup as bpages = 1 # How many pages should be scraped# Category URL, a.k.a. where to get the ads fromcatURL = "http://olx.ro/electronice-si-electrocasnice/laptop-calculator/"# Links to the Ajax requestsajaxNum = "http://olx.ro/ajax/misc/contact/phone/"ajaxYah = "http://olx.ro/ajax/misc/contact/communicator/"ajaxSky = "http://olx.ro/ajax/misc/contact/skype/"def getName(link): # Get the name from the ad page = requests.get(link) soup = b(page.text) match = soup.find(attrs={"class": "block color-5 brkword xx-large"}) name = re.search(">(.+)<", str(match)).group(1) return namedef getPhoneNum(aID): # Get the phone number resp = requests.get("%s%s/" % (ajaxNum, aID)).text try: resp = json.loads(resp).get("value") except ValueError: return # No phone number if "span" in resp: # Multiple phone numbers nums = b(resp).find_all(text=True) for num in nums: if num != " ": return num else: return respdef getYahoo(aID): # Get the Yahoo! ID resp = requests.get("%s%s/" % (ajaxYah, aID)).text try: resp = json.loads(resp).get("value") except ValueError: return # No Yahoo! ID else: return respdef getSkype(aID): # Get the Skype ID resp = requests.get("%s%s/" % (ajaxSky, aID)).text try: resp = json.loads(resp).get("value") except ValueError: return # No Skype ID else: return respdef main(): for pageNum in range(1, pages+1): print("Page %d." % pageNum) page = requests.get(catURL + "?page=" + str(pageNum)) soup = b(page.text) links = soup.findAll(attrs={"class": "marginright5 link linkWithHash \ detailsLink"}) for a in links: aID = re.search('ID(.+)\.', a['href']).group(1) print("ID: %s" % aID) print("\tName: %s" % getName(a['href'])) if getPhoneNum(aID) != None: print("\tPhone: %s" % getPhoneNum(aID)) if getYahoo(aID) != None: print("\tYahoo: %s" % getYahoo(aID)) if getSkype(aID) != None: print("\tSkype: %s" % getSkype(aID))if __name__ == "__main__": main()Tocmai scraper: https://rstforums.com/forum/98245-tocmai-ro-scraper-nume-oras-numar-telefon.rst Edited March 6, 2015 by pr00f 6 2 Quote
UnixDevel Posted February 26, 2015 Report Posted February 26, 2015 ideea este ca e mai usor prin api-ul de la mobile ca acolo nu ai limite la scrape pe cand aici daca imi aduc aminte erau ceva limite stiu ca eu am altscrapper facut de mine pentru anunturi Quote
florinul Posted February 28, 2015 Report Posted February 28, 2015 am pus la numbe of page 500 dupa pagina 10 da erorare nu mai scoate nimic... e vreo protectie de la site? Quote
WerBF Posted February 28, 2015 Report Posted February 28, 2015 Sunt sigur ca da . Tu trimiti prea multe requesturi intr-un timp prea scurt . Quote
pr00f Posted July 6, 2015 Author Report Posted July 6, 2015 (edited) e in c++ nu? @pr00fpython Edited July 6, 2015 by pr00f Quote
Active Members MrGrj Posted July 6, 2015 Active Members Report Posted July 6, 2015 e in c++ nu? @pr00fE python Inainte sa pui intrebari de genul, intereseaza-te putin. Raspunsul dat de pr00f este ironic. Nu te apuca sa faci scrappere in Pascal. Quote
Sir-Galahad Posted July 6, 2015 Report Posted July 6, 2015 Vezi c? ai o mul?ime de topicuri unde ceri lucrul x sau y. Începe s? mai ?i oferi. Quote
Vlachs Posted July 17, 2015 Report Posted July 17, 2015 E python Inainte sa pui intrebari de genul, intereseaza-te putin. Raspunsul dat de pr00f este ironic. Nu te apuca sa faci scrappere in Pascal.Why not, pascal rullz 1 Quote
pr00f Posted July 18, 2015 Author Report Posted July 18, 2015 import requests ImportError: No module named requestssudo apt-get install python-requests sau din pip sau cu package managerul folosit de tine, sau cauta ba pe internet Quote
hades Posted July 31, 2015 Report Posted July 31, 2015 Interesant. Sunt unele mici chestii care s-ar putea imbunatatii d.p.d.v functional sau pep8 stuff.Poate-mi fac timp weekendul asta si cu acordu' lu pr00f fac niste modificari pe el. Daca nu-mi dau grav si uit. Quote
shuttershades Posted September 25, 2015 Report Posted September 25, 2015 Class: PHP OLX Classifieds Scraper - PHP Classes Quote
Georgee7 Posted May 18, 2018 Report Posted May 18, 2018 Salut, daca am decat link-ul username-ului pe olx, si acest user nu are anunturi, dar a avut in trecut.. acum pot sa ii aflu nr de telefon ? Quote