59 lines
3.0 KiB
INI
Executable File
59 lines
3.0 KiB
INI
Executable File
**------------------------------------------------------------------------------------------------
|
||
* @header_start
|
||
* WebGrab+Plus ini for grabbing EPG data from TvGuide websites
|
||
* @Site: tvb.com.pearl
|
||
* @MinSWversion: V2.1.9
|
||
* @Revision 1 - [15/03/2020] WGT
|
||
* @Revision 0 - [10/05/2014] Jan van Straaten
|
||
* - creation
|
||
* @Remarks:
|
||
* only for the partly English channel Pearl.
|
||
* @header_end
|
||
**------------------------------------------------------------------------------------------------
|
||
site {url=tvb.com|timezone=Asia/Singapore|maxdays=8|cultureinfo=en-GB|charset=UTF-8|titlematchfactor=90}
|
||
*
|
||
urldate.format {datestring|yyyy-MM-dd}
|
||
url_index{url|http://programme.tvb.com/ajax.php?action=channellist&code=|channel|&date=|urldate|}
|
||
url_index.headers {customheader=Accept-Encoding=gzip,deflate} * to speedup the downloading of the index pages
|
||
index_showsplit.scrub {multi|<ul class="clearfix"|<li class="item||</ul>}
|
||
*
|
||
index_date.scrub {single|<ul class="clearfix"|date="|">|<li}
|
||
index_start.scrub {single|<span class="time">||</span>|</span>}
|
||
*index_temp_1 is the index_title candidate
|
||
index_temp_1.scrub {single|<p class="ftit">||</p>|</div>}
|
||
index_temp_1.modify {cleanup(tags="<"">")}
|
||
*remove chinese chars:
|
||
index_temp_1.modify {remove(type=regex)|"([\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF]+)"} * removes the chinese chars
|
||
index_temp_1.modify {remove(type=regex)|\<.*?cite.*?\>} *
|
||
index_temp_1.modify {replace(type=regex)|"([ ]+)"| } * replaces a strange space C2A0
|
||
index_temp_1.modify {remove(type=regex)|\[.*?\]}
|
||
index_temp_1.modify {remove|[]}
|
||
index_temp_1.modify {cleanup}
|
||
index_temp_1.modify {remove(type=regex)|"(^\W+)"}
|
||
* divide title and subtitle at the first :
|
||
index_temp_1.modify {replace|:|\|}
|
||
index_title.modify {substring(type=element)|'index_temp_1' 0 1}
|
||
index_temp_1.modify {substring(type=element)|1} * the remaining part, move to subtitle
|
||
index_temp_1.modify {replace|\||: } * make single
|
||
index_subtitle.scrub {single|<h4><a href="|">|</a>|</a>}
|
||
index_subtitle.modify {remove(type=regex)|"(.*[\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF]+)"} * removes the chinese chars
|
||
index_subtitle.modify {addstart(2 'index_temp_1' not "")|'index_temp_1'. }
|
||
index_temp_2.scrub {single|<h4><a href="|<p>|</p>|</p>} * will become the description
|
||
index_temp_2.modify {remove(type=regex)|"([\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF]+)"} * removes the chinese chars
|
||
index_temp_2.modify {cleanup}
|
||
*
|
||
index_description.scrub{single|<p class="full" >||</p>|</div>}
|
||
index_description.modify {cleanup(tags="<"">")}
|
||
index_temp_3.scrub {single|<p class="short"|">|<|<}
|
||
index_description.modify {addend("")|'index_temp_3'}
|
||
index_description.modify {cleanup}
|
||
|
||
*_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
|
||
** ##### CHANNEL FILE CREATION (only to create the xxx-channel.xml file)
|
||
**
|
||
** @auto_xml_channel_start
|
||
*index_site_id.scrub {multi|}
|
||
*index_site_id.modify {addstart|P}
|
||
*index_site_channel.modify {addstart|Pearl}
|
||
** @auto_xml_channel_end
|