Back to Question Center
0

Semalt: Ukusebenzisa i-Python To Scrape Websites

1 answers:

I-Web scraping echazwe njenge-extraction data yedatha yinkqubo yokufumana idatha kwiwebhu nokuthumela idatha kwiifom ezisebenzisekayo. Kwiimeko ezininzi, le nqubo isetyenziswe ngabaphathi bewebhu ukukhupha ixabiso elikhulu leenkcukacha ezixabisekileyo kumaphepha ewebhu, apho idatha echongiweyo igcinwa kwi-Microsoft Excel okanye kwifayile yeendawo.

Indlela yokucoca iWebhu ngePythoni

Kwabaqalayo, i-Python yenye yeelwimi eziqhelekileyo ezisetyenziswayo ezigxininisa kakhulu kwi-readability code. Okwangoku, iPython isebenza njengePython 2 nePython 3 - grey blue fascinator. Ulwimi lwenkqubo lubonisa ukuphathwa kwememori ngokuzenzekelayo kunye nenkqubo yohlobo olushukumisayo. Ngoku, ulwimi lweprogram yePython lubonisa nophuhliso loluntu.

Kutheni iPhython?

Ukufumana idatha kwiiwebhusayithi ezinamandla ezifuna ukungena ngemvume kuye kwaba ngumngeni omkhulu kubaninzi be-webmasters. Kule khokelo yokutshiza, uya kufunda indlela yokwenza isayithi efuna imvume yokungena ngemvume usebenzisa i-Python. Nantsi inqaku elihamba ngenyathelo eliya kukunceda ukuba ugqibezele inkqubo yokukhangela ngokufanelekileyo.

Inyathelo 1: Ukufunda i-Website-Target-13

Ukukhipha idatha kwiiwebhusayithi ezinamandla ezifuna imvume yokungena ngemvume, kufuneka uququzelele iinkcukacha ezifunekayo.

Ukuze uqalise, nqakraza ngakwesokudla kwi "Igama lomsebenzisi" kwaye ukhethe kwi "Khangela i-element". "Igama lomsebenzisi" liya kuba yintloko.

nqakraza ngakwesokudla kwi icon "Iphasiwedi" kwaye ukhethe "Hlola into".

Khangela "ukuqinisekiswa_kuqinisekisiweyo" phantsi komthombo wephepha. Vumela itekethi yakho yokufakwayo efihliweyo ibe yixabiso lakho. Nangona kunjalo, kubalulekile ukuqaphela ukuba iiwebhusayithi ezahlukeneyo zisebenzisa ama-tags afakwe kwiimpawu ezifihliweyo.

Ezinye iiwebhusayithi zisebenzisa ifowuni yokungena elula ngelixa abanye bathabatha iifom ezinzima. Ukuba usebenza kwiindawo ezisemgangathweni ezisebenzisa izakhiwo eziyinkimbinkimbi, khangela i-log yesicelo sakho sokungena kwaye ubhale iimpawu ezibalulekileyo kunye nezitshixo ezisetyenziselwa ukungena kwiwebhusayithi.

Isinyathelo 2: Ukwenza Ingenelo kwiSayithi lakho

Kule nyathelo, yenza iseshoni into eyakukuvumela ukuba uqhubeke neseshoni yokungena njengezo zonke izicelo zakho.Into yesibili ukuyiqwalasela isusa "uphawu lwe-csrf" kwikhasi lakho elijoliswe kwiwebhu. Uphawu luyakunceda ngexesha lokungena ngemvume. Kule meko, sebenzisa i-XPath kunye ne-lxml ukufumana ithokheni. Yenza isigaba sokungena ngemvume ngokuthumela isicelo kwi-URL yokungena.

Inyathelo lesi-3: Idatha yokucoca

Ngoku unokukhipha idatha kwiziko lakho elijoliswe kuyo. Sebenzisa i-XPath ukuchonga into ejoliswe kuyo kwaye uvelise iziphumo. Ukuqinisekisa iziphumo zakho, khangela iifom yekhowudi ye-status status yesicelo ngasinye. Nangona kunjalo, ukuqinisekiswa kweziphumo akukwazisi ukuba isigaba sokungena ngemvume siphumelele kodwa senza njengesalathisi.

Ukucoca iingcali, kubalulekile ukuba uqaphele ukuba ixabiso lokubuyiselwa kweemvavanyo ze-XPath ziyahluka. Iziphumo zixhomekeke kwi-XPath ibinzana eliqhutywa ngumsebenzisi wokugqibela. Ulwazi lokusebenzisa amazwi rhoqo kwi-XPath kunye nokuvelisa ii-XPath ziza kukunceda ukhiphe idatha kwiisayithi ezifuna ukungena ngemvume ngemvume.

NgePython, awuyidingi isicwangciso sokubuyisela isiqhelo okanye ukhathazeke ngokuphazamiseka kwediski. I-Python ichonga ngokuchanekileyo idatha esuka kwiindawo ezisisigxina kunye ezinamandla ezifuna imvume yokungena ngemvume ukufikelela kumxholo. Thatha i-web scraping yamava kwinqanaba elilandelayo ngokufaka ifayile yePython kwikompyutha yakho.

December 22, 2017