epg/siteini.pack/Hong Kong/tvb.com.pearl.ini

**------------------------------------------------------------------------------------------------
* @header_start
* WebGrab+Plus ini for grabbing EPG data from TvGuide websites
* @Site: tvb.com.pearl
* @MinSWversion: V2.1.9
* @Revision 1 - [15/03/2020] WGT
* @Revision 0 - [10/05/2014] Jan van Straaten
*   - creation
* @Remarks:
* 	only for the partly English channel Pearl.
* @header_end
**------------------------------------------------------------------------------------------------
site {url=tvb.com|timezone=Asia/Singapore|maxdays=8|cultureinfo=en-GB|charset=UTF-8|titlematchfactor=90}
*
urldate.format {datestring|yyyy-MM-dd}
url_index{url|http://programme.tvb.com/ajax.php?action=channellist&code=|channel|&date=|urldate|}
url_index.headers {customheader=Accept-Encoding=gzip,deflate}     * to speedup the downloading of the index pages
index_showsplit.scrub {multi|<ul class="clearfix"|<li class="item||</ul>}
*
index_date.scrub {single|<ul class="clearfix"|date="|">|<li}
index_start.scrub {single|<span class="time">||</span>|</span>}
*index_temp_1 is the index_title candidate
index_temp_1.scrub {single|<p class="ftit">||</p>|</div>}
index_temp_1.modify {cleanup(tags="<"">")}
*remove chinese chars:
index_temp_1.modify {remove(type=regex)|"([\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF]+)"} * removes the chinese chars
index_temp_1.modify {remove(type=regex)|\<.*?cite.*?\>} *
index_temp_1.modify {replace(type=regex)|"([ ]+)"| } * replaces a strange space C2A0
index_temp_1.modify {remove(type=regex)|\[.*?\]}
index_temp_1.modify {remove|[]}
index_temp_1.modify {cleanup}
index_temp_1.modify {remove(type=regex)|"(^\W+)"}
* divide title and subtitle at the first :
index_temp_1.modify {replace|:|\|}
index_title.modify {substring(type=element)|'index_temp_1' 0 1}
index_temp_1.modify {substring(type=element)|1} * the remaining part, move to subtitle
index_temp_1.modify {replace|\||: } * make single
index_subtitle.scrub {single|<h4><a href="|">|</a>|</a>}
index_subtitle.modify {remove(type=regex)|"(.*[\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF]+)"} * removes the chinese chars
index_subtitle.modify {addstart(2 'index_temp_1' not "")|'index_temp_1'. }
index_temp_2.scrub {single|<h4><a href="|<p>|</p>|</p>} * will become the description
index_temp_2.modify {remove(type=regex)|"([\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF]+)"} * removes the chinese chars
index_temp_2.modify {cleanup}
*
index_description.scrub{single|<p class="full" >||</p>|</div>}
index_description.modify {cleanup(tags="<"">")}
index_temp_3.scrub {single|<p class="short"|">|<|<}
index_description.modify {addend("")|'index_temp_3'}
index_description.modify {cleanup}

*_  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _
**      #####  CHANNEL FILE CREATION (only to create the xxx-channel.xml file)
**
** @auto_xml_channel_start
*index_site_id.scrub {multi|}
*index_site_id.modify {addstart|P}
*index_site_channel.modify {addstart|Pearl}
** @auto_xml_channel_end