Che Posted March 30, 2021 Report Posted March 30, 2021 Vreau sa accesez sursa acestei pagini in Python: https://www.whoscored.com/Matches/1485370/Live/England-Premier-League-2020-2021-Brighton-Leicester Scriptul in Python este acesta: import ssl import requests try: _create_unverified_https_context = ssl._create_unverified_context except AttributeError: pass else: ssl._create_default_https_context = _create_unverified_https_context url = 'https://www.whoscored.com/Matches/1485370/Live/England-Premier-League-2020-2021-Brighton-Leicester' headers = { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'sec-ch-ua': '\"Google Chrome\";v=\"89\", \"Chromium\";v=\"89\", \";Not A Brand\";v=\"99\"', 'sec-ch-ua-mobile': '?0', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36' } response = requests.get(url) print(response.content) Problema este ca nu ma lasa. Ori primesc eroare ca nu e certificatul bun (si de asta am pus si try-block-ul cu certificatul la inceputul codului), ori pur si simplu nu imi da voie ca e nu stiu ce problema cu Incapsula. Nu inteleg de ce nu imi da voie ca doar practic emulez la perfectie un browser si tot isi da seama cumva ca nu e real. Ma poate ajuta cineva, va rog? Multumesc mult! Quote
FoxBlood Posted March 30, 2021 Report Posted March 30, 2021 Initial am crezut ca nu ai setat toate lucrurile la locul lor, am mai adaugat niste headere. dupa am vazut ca nu ai setat efectiv headerul in request (evident, imi luasem si cooldown intre timp). Codul asta imi merge bine: import requests # try: # _create_unverified_https_context = ssl._create_unverified_context # except AttributeError: # pass # else: # ssl._create_default_https_context = _create_unverified_https_context url = 'https://www.whoscored.com/Matches/1485370/Live/England-Premier-League-2020-2021-Brighton-Leicester' headers = { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'Accept-Encoding': 'gzip, deflate, br', 'Accept-Language': 'en-US,en;q=0.5', 'Connection': 'keep-alive', 'DNT': '1', 'Host': 'www.whoscored.com', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0' } s = requests.Session() response = s.get(url, allow_redirects=True, headers=headers) print(response.content) Ai spune ca dupa ce executi asta nu primesti chiar ce vrei si ca ai fost blocat, asta a fost prima mea impresie, dupa am vazut ca nu am luat cooldown, primesti tot ce trebuie doar ca nu poti executa javascript-ul din pagina ca sa poti vedea tot content-ul folosind doar modulul asta requests Javascript criptat.. evident... Cred ca modulul asta este util daca esti prea lenes sa vezi cum se decripteaza https://pypi.org/project/requests-html/ 1 Quote
Zatarra Posted April 1, 2021 Report Posted April 1, 2021 Ai incercat cu verify=false? response = requests.get(url, verify=False) Quote
Che Posted April 2, 2021 Author Report Posted April 2, 2021 @FoxBlood Cum iti da tie acel cod js ca mie de fiecare data imi da doar asta? b'<html style="height:100%"><head><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><meta name="format-detection" content="telephone=no"><meta name="viewport" content="initial-scale=1.0"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"><script type="text/javascript" src="/_Incapsula_Resource?SWJIYLWA=719d34d31c8e3a6e6fffd425f7e032f3"></script></head><body style="margin:0px;height:100%"><iframe id="main-iframe" src="/_Incapsula_Resource?SWUDNSAI=30&xinfo=7-127053514-0%200NNN%20RT%281617365412939%2074%29%20q%280%20-1%20-1%201%29%20r%280%20-1%29%20B12%2811%2c8628%2c0%29%20U18&incident_id=875000100355227794-412533605997937287&edet=12&cinfo=0b000000&rpinfo=0&cts=nwc8yt0AuMjh0gpLuhos1IuJJSpDlyUzeNlHAg8f8pgX3fHR3fTHQ1klHRlgMFI6" frameborder=0 width="100%" height="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula incident ID: 875000100355227794-412533605997937287</iframe></body></html>' @Zatarra Am incercat si tot nu merge. Inca mai da si eroare de SSL cum ca nu ar fi bine sa eviti verificarea. Asta ffind de fapt un warning de la urllib3. Dar eroarea de mai sus e aceeasi. Quote
Elohim2 Posted September 11, 2021 Report Posted September 11, 2021 Pentru referinte viitoare, ar fi mai util sa folosesti phantomjs pentru astfel de pagini, chiar si pentru CF. Quote
Che Posted September 12, 2021 Author Report Posted September 12, 2021 On 9/11/2021 at 12:40 PM, Elohim2 said: Pentru referinte viitoare, ar fi mai util sa folosesti phantomjs pentru astfel de pagini, chiar si pentru CF. De ce? Ce are daca folosesti Selenium? Care e diferenta? Multumesc anticipat! Quote
gigiRoman Posted September 12, 2021 Report Posted September 12, 2021 Phantomjs e headleass chrome only. Selenium e multi browser. Pt gui tests pe mai multe browsere e recomandat selenium: https://stackoverflow.com/questions/14099770/casperjs-phantomjs-vs-selenium Quote
Che Posted September 12, 2021 Author Report Posted September 12, 2021 20 minutes ago, gigiRoman said: Phantomjs e headleass chrome only. Selenium e multi browser. Pt gui tests pe mai multe browsere e recomandat selenium: https://stackoverflow.com/questions/14099770/casperjs-phantomjs-vs-selenium Adica spui ca Phantomjs este doar Chrome si nu poate fi detectat? Dar si Selenium tot Chrome este ca descarci chrome driver si pe ala il iei si il folosesti care este chrome practic. Quote
gigiRoman Posted September 12, 2021 Report Posted September 12, 2021 3 minutes ago, Che said: Adica spui ca Phantomjs este doar Chrome si nu poate fi detectat? Dar si Selenium tot Chrome este ca descarci chrome driver si pe ala il iei si il folosesti care este chrome practic. Atat phantomjs cat si selenium sunt frameworkuri de testare automata. Phantomjs foloseste doar chrome driver, insa cu selenium poti automatiza browsere multiple. Ele nu sunt facute sa nu fii detectat. Se folosesc in medii de test. Vezi ca trimit chiar si user agenti in care trec ca vin din selenium, respectiv phantomjs. Aici trebuie sa ai grija sa folosesti un proxy sa rescrii useragents... samd. Quote