Back to Question Center
0

I-Semalt Expert: I-Python ne-BeautifulSoup. IiSprape Sites nge-Easy

1 answers:

Xa wenza ukuhlaziywa kwedatha okanye iiprojekthi zokufunda kwamashishini, idatha efunekayo kwaye igqibe iprojekthi yakho. Ulwimi lweprogram lwePython lunomqoqo onamandla lwezixhobo kunye neemodyuli ezingasetyenziselwa le njongo. Ngokomzekelo, ungasebenzisa imodyuli ye-BeautifulSoup ye-HTML yokusingatha - tri-states grain conditioning inc.

Lapha, siza kujonga kwi-BeautifulSoup kwaye sibone ukuba kutheni ngoku kusetyenziswa ngokubanzi kwi-13 (web) .

Iimpawu ezintle ze-BeautifulSoup

- Inika iindlela ezahlukeneyo zokuhamba ngokukhawuleza, ukukhangela kunye nokuguqula imithi ye-parse ngoko kukuvumela ukuba uchithe umqulu ngokukhawuleza uze ukhiphe yonke into oyifunayo ngaphandle kokubhala ikhowudi.

- Iguqulela ngokuzenzakalelayo amaxwebhu aphumayo kwi-UTF-8 kunye namaxwebhu angenayo kwi-Unicode. Oku kuthetha ukuba akuyi kuba nexhala malunga neenkomfa ezinikezelwe ukuba uxwebhu luye lwacacisa ikhowudi okanye iSobho esilungileyo singakuvumela.

- I-BeautifulSoup ithathwa njengophezulu kunamanye amaPython ahlukeneyo afana ne-html5lib kunye ne-lxml. Ivumela ukuzama iindlela ezahlukeneyo zokusasaza. Enye into engeyiyo yale modyuli, nangona kunjalo, kukuba inika ukulungelelanisa ngakumbi kwiindleko.

Yintoni oyifunayo ukuyifaka iwebhusayithi kunye ne-BeautifulSoup?

Ukuqala ukusebenza ne-BeautifulSoup, kufuneka ube nenkqubo yeprogram yePython (nokuba yendawo okanye iseva-based) isetyenziswe kumatshini wakho. I-Python ivame ukufakwa ngaphambili kwi-OS X, kodwa ukuba usebenzisa iWindows, kufuneka ulande kwaye ufake ifayile kwiwebhusayithi esemthethweni.

Kufuneka ube neeModyuli ezimnandi kunye nezicelo ezifakiwe.

Ekugqibeleni, ukuba uqhelanise kwaye ukhululekile ukusebenza nge-HTML tagging kunye nesakhiwo kubaluleke kakhulu kuba usebenza nge-web-sourced data.

Ukufakela izicelo kunye neeMathala eencwadi ezimnandi

Ngeenkqubo zenkqubo yePython ehlelwe kakuhle, ngoku unokwenza iifayile entsha (usebenzisa i-nano, umzekelo) kunye nawaphi igama oliyithandayo.

Ilayibrari yesicelo yenza ukuba usebenzise ifomu efundwa ngumntu i-HTTP ngaphakathi kwiprogram yakho yePython ngelixa i-BeautifulSoup ithola ukukhwa kweso santya. Ungasebenzisa isitatimende sokungenisa ukufumana iincwadi zombini.

Indlela yokuqokelela nokudibanisa ikhasi lewebhu

Sebenzisa izicelo. fumana

indlela yokuqokelela i-URL yekhasi lewebhu apho ufuna ukukhipha idatha. Emva koko, yakha into enhle ye-BeautifulSoup okanye udibanise umthi. Le nto ithatha uxwebhu oluvela kwiimfuno njengengxabano yalo kwaye luyiphakamise. Kwiphepha eliqokelelweyo, lichithwe kwaye lifakwe njengeNdawo eClebileyo, unokuqhubeka uqokelele idatha oyifunayo.

Ukukhipha umbhalo ofunwayo kwiphepha lewebhu

Nanini na xa ufuna ukuqokelela idatha yewebhu, kufuneka ukwazi ukuba loo datha ichazwe yi-Document Object Model (DOM) yephepha lewebhu. Kwiqhosha lakho lewebhu, nqakraza ngakwesokudla (ukuba usebenzisa iWindows), okanye CTRL + nqakraza (ukuba usebenzisa i-macOS) kwenye yezinto ezenza inxalenye yedatha yolwazi. Ngokomzekelo, ukuba ufuna ukukhupha idatha malunga neentlanga zabafundi, chofoza kwelinye lamagama omfundi. Imenyu yomongo ikhupha, kwaye ngaphakathi kwayo, uza kubona into efana ne-Inspect Element (ye-Firefox) okanye Hlola (kwi-Chrome). Cofa kwizinto ezifanelekileyo zeMenyu yokuhlola, kwaye izixhobo zonjiniyela zonxibelelwano ziya kubonakala kwisiphequluli sakho.

I-BeautifulSoup yinto elula kodwa enamandla ye-HTML yokuxilonga ekuvumela ukuba uguquguquke xa ukucima amawebhusayithi . Xa usebenzisa, musa ukulibala ukugcina imithetho yokuqhawula ngokubanzi njengokujonga iiMigomo neMeko yeWebhsayithi; ukuhlaziywa kwesiza rhoqo kunye nokuhlaziya ikhowudi yakho njengoko kukho utshintsho olwenziwe kwisayithi. Ukuba nolwazi malunga nokukhangela iiwebhusayithi ngePython kunye neNgcono ehle, ngoku ngoku ungayifumana kalula idatha yewebhu oyifunayo kwiprojekthi yakho.

December 22, 2017