Back to Question Center
0

Semalt Islamabad Expert - Oko Okufuneka Ukwazi Ngowe-Web Crawler

1 answers:

Injini yokukhangela i-crawler iyisicelo esizenzekelayo, iskripthi okanye inkqubo ewela kwiWebhu ye-Wide yeWebhu ngendlela ehleliweyo yokubonelela ngolwazi olutsha malunga nenjini ethile yokukhangela. Ngaba uye wakha wazibuza ukuba kutheni ufumana iisethi ezahlukeneyo zeziphumo xa uthayipha amagama asemqoka kwiBing okanye iGoogle? Kungenxa yokuba i-webpages ilayishwa yonke iminithi. Kwaye njengoko balayishwayo bebhanki bewebhu bahamba ngamaphepha amasha ewebhu.

UMichael Brown, ingcali ehamba phambili Semalt , uxelela ukuba abaqambi bewebhu, ababizwa ngokuba yi-automatic indexers kunye ne-spiders yewebhu, basebenze kwiindlela ezifanelekileyo zokulungisa iinjinjini ezahlukeneyo. Inkqubo yokukhwabanisa kwewebhu iqalisa ngokuchongwa kwee-URL eziza kufuneka zihanjelwe nokuba zilayishiwe nje okanye ngenxa yokuba amanye amaphepha abo ewebhu anomxholo omtsha. Ezi zi-URL ezichongiweyo ziyaziwa njengeembewu kwixesha le-injini yosesho.

Lawa ma-URL ekugqibeleni ahanjelwe kwaye aphinde ahanjelwe ngokuxhomekeka kwimixholo emitsha elayishiwe kubo kunye nemigaqo-nkqubo ekhokela izigulane. Ngexesha lokutyelela, zonke ii-hyperlink kumaphepha ewebhu ngalinye zichongwa kwaye zongezwa kwoluhlu. Kule ngongoma, kubalulekile ukuchaza ngemigca ecacileyo ukuba iinjongo ezahlukeneyo zophando zisebenzisa izilungiso ezihlukeneyo kunye nemigaqo-nkqubo. Yingakho kuya kuba neyantlukwano kwimiphumo ye-Google kunye neziphumo ze-Bing zamagama angundoqo nangona kuya kubakho ukufana okuninzi.

Abaqhubi beWebhu benza imisebenzi eninzi yokugcina iinjongo zokukhangela. Enyanisweni, umsebenzi wabo unzima kakhulu ngenxa yezizathu ezintathu ezingezantsi.

1. Umthamo wephepha lewebhu kwi-intanethi ngexesha ngalinye. Uyazi ukuba kunezigidi ezininzi zeewebhu kwiwebhu kwaye ezininzi ziqaliswa rhoqo imihla. Ingakumbi umthamo wewebhusayithi kumnatha, kulukhuni ukulungiselela abaqhubi ukuba bafike kwixesha.

2..Isantya apho iiwebhusayithi ziqaliswa khona. Ingaba unayo nayiphi na ingcinga ukuba zininzi iiwebhusayithi ezitsha ziqaliswa imihla ngemihla?

3. Ubume apho umxholo ushintshwe nakwiiwebhusayithi ezikhoyo kunye nokudibanisa kwamaphepha ashukumisayo.

Le yimiba emithathu eyenza kube nzima ukuba izigulane zewebhu zifikelele. Esikhundleni sokukhawulela iiwebhsayithi kwisiseko sokuqala sokuza kuqala, ezininzi izikhangeli zewebhu zibeka phambili phambili kumaphepha ewebhu kunye nama-hyperlink. I-prioritization isekelwe kwimigaqo-nkqubo yesiqhelo ye-injini ye-injini ye-4 jikelele.

1. Umgaqo wokukhetha usetyenziswa ekukhetheni amaphepha afundwayo ukukhwela kuqala.

2. Uhlobo lomgaqo-nkqubo olutye lusetyenziselwa ukuchonga nini kwaye kaninzi kangakanani amaphepha ewebhu aphinda ahlaziywe ukuze kwenziwe utshintsho olunokwenzeka.

3. Umgaqo wokuphulukiswa komzimba usetyenziselwa ukulungelelanisa indlela abaqhubi abahambisa ngayo ukuhanjiswa ngokukhawuleza kwazo zonke iimbewu.

4. Umgaqo-nkqubo usetyenzisiweyo ukujonga indlela i-URL ekhwela ngayo ukukhusela ukugqithiswa kwamawebhusayithi.

Ngokukhawuleza kunye nokuchaneka kwenqwelo yembewu, abaqhumbuzi kufuneka babe nekhono elikhulu lokukhwela elivumela ukubeka phambili kunye nokunciphisa amaphepha ewebhu, kwaye kufuneka babe ne-architecture ephezulu kakhulu. Ezi zimbini ziza kwenza kube lula ngabo ukuba zikhawule kwaye zikhuphe ikhulu leemitha zamaphepha ewebhu kwiiveki ezimbalwa.

Kwiimeko ezifanelekileyo, iphepha ngalinye lewebhu lithatyathwa kwiWebhu yeWebhu yehlabathi kwaye ithathwa ngokulandelwa kwemifanekiso emininzi emva koko, amaphepha ewebhu okanye ii-URL zigqityiwe phambi kokuba zidluliselwe kumcwangcisi ozinikezele. Ii-URL eziphambili zithathwa ngokukhupha i-downloader kwakhona, ukuze i-metadata kunye neeteksi zabo zigcinwe ngokukhawulela ngokufanelekileyo.

Okwangoku, kukho iincinci ezininzi zokukhangela okanye i-crawlers. Elinye elisetyenziswe nguGoogle yi-Google Crawler. Ngaphandle kwezicukulu zewebhu, iphepha lokukhangela le-injini liza kubuyisa iziphumo ezingenasiphelo okanye umxholo ongapheliyo kuba amaphepha ewebhu amatsha ayengeze adweliswe. Enyanisweni, akuyi kuba nantoni na njengophando lwe-intanethi.

November 29, 2017
Semalt Islamabad Expert - Oko Okufuneka Ukwazi Ngowe-Web Crawler
Reply