Back to Question Center
0

Semalt: Ukusebenzisa i-Python To Scrape Websites

1 answers:

I-Web scraping echazwe njenge-extraction data yedatha yinkqubo yokufumana idatha kwiwebhu nokuthumela idatha kwiifom ezisebenzisekayo. Kwiimeko ezininzi, le nqubo isetyenziswe ngabaphathi bewebhu ukukhupha ixabiso elikhulu leenkcukacha ezixabisekileyo kumaphepha ewebhu, apho idatha echongiweyo igcinwa kwi-Microsoft Excel okanye kwifayile yeendawo.

Indlela yokucoca iWebhu ngePythoni

Kwabaqalayo, i-Python yenye yeelwimi eziqhelekileyo ezisetyenziswayo ezigxininisa kakhulu kwi-readability code. Okwangoku, iPython isebenza njengePython 2 nePython 3. Ulwimi lwenkqubo lubonisa ukuphathwa kwememori ngokuzenzekelayo kunye nenkqubo yohlobo olushukumisayo - рутер с точкой доступа netgear. Ngoku, ulwimi lweprogram yePython lubonisa nophuhliso loluntu.

Kutheni iPhython?

Ukufumana idatha kwiiwebhusayithi ezinamandla ezifuna ukungena ngemvume kuye kwaba ngumngeni omkhulu kubaninzi be-webmasters. Kule khokelo yokutshiza, uya kufunda indlela yokwenza isayithi efuna imvume yokungena ngemvume usebenzisa i-Python. Nantsi inqaku elihamba ngenyathelo eliya kukunceda ukuba ugqibezele inkqubo yokukhangela ngokufanelekileyo.

Inyathelo 1: Ukufunda i-Website-Target-13

Ukukhipha idatha kwiiwebhusayithi ezinamandla ezifuna imvume yokungena ngemvume, kufuneka uququzelele iinkcukacha ezifunekayo.

Ukuze uqalise, nqakraza ngakwesokudla kwi "Igama lomsebenzisi" kwaye ukhethe kwi "Khangela i-element". "Igama lomsebenzisi" liya kuba yintloko.

nqakraza ngakwesokudla kwi icon "Iphasiwedi" kwaye ukhethe "Hlola into".

Khangela "ukuqinisekiswa_kuqinisekisiweyo" phantsi komthombo wephepha. Vumela itekethi yakho yokufakwayo efihliweyo ibe yixabiso lakho. Nangona kunjalo, kubalulekile ukuqaphela ukuba iiwebhusayithi ezahlukeneyo zisebenzisa ama-tags afakwe kwiimpawu ezifihliweyo.

Ezinye iiwebhusayithi zisebenzisa ifowuni yokungena elula ngelixa abanye bathabatha iifom ezinzima. Ukuba usebenza kwiindawo ezisemgangathweni ezisebenzisa izakhiwo eziyinkimbinkimbi, khangela i-log yesicelo sakho sokungena kwaye ubhale iimpawu ezibalulekileyo kunye nezitshixo ezisetyenziselwa ukungena kwiwebhusayithi.

Isinyathelo 2: Ukwenza Ingenelo kwiSayithi lakho

Kule nyathelo, yenza iseshoni into eyakukuvumela ukuba uqhubeke neseshoni yokungena njengezo zonke izicelo zakho.Into yesibili ukuyiqwalasela isusa "uphawu lwe-csrf" kwikhasi lakho elijoliswe kwiwebhu. Uphawu luyakunceda ngexesha lokungena ngemvume. Kule meko, sebenzisa i-XPath kunye ne-lxml ukufumana ithokheni. Yenza isigaba sokungena ngemvume ngokuthumela isicelo kwi-URL yokungena.

Inyathelo lesi-3: Idatha yokucoca

Ngoku unokukhipha idatha kwiziko lakho elijoliswe kuyo. Sebenzisa i-XPath ukuchonga into ejoliswe kuyo kwaye uvelise iziphumo. Ukuqinisekisa iziphumo zakho, khangela iifom yekhowudi ye-status status yesicelo ngasinye. Nangona kunjalo, ukuqinisekiswa kweziphumo akukwazisi ukuba isigaba sokungena ngemvume siphumelele kodwa senza njengesalathisi.

Ukucoca iingcali, kubalulekile ukuba uqaphele ukuba ixabiso lokubuyiselwa kweemvavanyo ze-XPath ziyahluka. Iziphumo zixhomekeke kwi-XPath ibinzana eliqhutywa ngumsebenzisi wokugqibela. Ulwazi lokusebenzisa amazwi rhoqo kwi-XPath kunye nokuvelisa ii-XPath ziza kukunceda ukhiphe idatha kwiisayithi ezifuna ukungena ngemvume ngemvume.

NgePython, awuyidingi isicwangciso sokubuyisela isiqhelo okanye ukhathazeke ngokuphazamiseka kwediski. I-Python ichonga ngokuchanekileyo idatha esuka kwiindawo ezisisigxina kunye ezinamandla ezifuna imvume yokungena ngemvume ukufikelela kumxholo. Thatha i-web scraping yamava kwinqanaba elilandelayo ngokufaka ifayile yePython kwikompyutha yakho.

December 22, 2017