**------------------------------------------------------------------------------------------------ * @header_start * WebGrab+Plus ini for grabbing EPG data from TvGuide websites * @Site: xin.msn.com * @MinSWversion: V1.1.1/54 * @Revision 0 - [11/01/2015] Francis De Paemeleere * - creation * @Remarks: * @header_end **------------------------------------------------------------------------------------------------ site {url=xin.msn.com|timezone=Asia/Singapore|maxdays=14.1|cultureinfo=en-US|charset=UTF-8|titlematchfactor=90|nopageoverlaps} urldate.format {daycounter|0} url_index{url|http://tvguide.xinmsnent.mediacorp.sg/starterkit/servlet/xintv/tvGuide?mcTvGuideId=1344396&number=0&timezone=%2B08%3A00®ion=|channel|&tvGuideType=tvguide+week+schedule} url_index.headers {customheader=Accept-Encoding=gzip,deflate} * to speedup the downloading of the index pages index_showsplit.scrub {regex||"pname".*?\}||} scope.range {(splitindex)|end} index_showsplit.modify {cleanup(removeduplicates=equal span=1)} end_scope index_start.scrub {regex||^.*?"showTimeInMillisecond"\s*:\s*([+-]?\d*)||} index_stop.scrub {regex||^.*?"endTime"\s*:\s*([+-]?\d*)||} index_title.scrub {regex||^.*?"pname"\s*:\s*"([^"\\]*(?:\\.[^"\\]*)*)"||} index_description.scrub {regex||^.*?"description"\s*:\s*"([^"\\]*(?:\\.[^"\\]*)*)"||} index_subtitle.scrub {regex||^.*?"episode"\s*:\s*"([^"\\]*(?:\\.[^"\\]*)*)"||} scope.range {(indexshowdetails)|end} index_start.modify {calculate(format=utctime)} index_stop.modify {calculate(format=utctime)} index_title.modify {cleanup(style=jsondecode)} index_subtitle.modify {cleanup(style=jsondecode)} index_description.modify{cleanup(style=jsondecode)} *extract HD quality from titlematchfactor index_videoquality.modify {substring(type=regex)|'index_title' "\s*\(\s*(HD)\s*\)$"} index_title.modify {remove(type=regex)|"(\s*\(\s*HD\s*\))$"} * extract PG info from subtitle index_rating.modify {substring(type=regex)|'index_subtitle' "\s*\(\s*(PG[^\)]*)\s*\)\s*"} index_subtitle.modify {remove(type=regex)|"(\s*\(\s*PG[^\)]*\s*\)\s*)"} * extract cast from description index_actor.modify {substring(type=regex)|'index_description' "\s*Cast[s]?\s*:(?:\s*([^,;]*)[,;])*\s*([^,;]*)\s*."} index_description.modify {remove(type=regex)|"(\s*Cast[s]?\s*:(?:\s*[^,;]*[,;])*\s*[^,;]*\s*.)"} index_producer.modify {substring(type=regex)|'index_description' "\s*Producer[s]?\s*:(?:\s*([^,;]*)[,;])*\s*([^,;]*)\s*."} index_description.modify {remove(type=regex)|"(\s*Producer[s]?\s*:(?:\s*[^,;]*[,;])*\s*[^,;]*\s*.)"} end_scope *index_urlshow {url|} index_urlshow.headers {customheader=Accept-Encoding=gzip,deflate} * to speedup the downloading of the detail pages ** _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ** ##### CHANNEL FILE CREATION (only to create the xxx-channel.xml file) ** ** @auto_xml_channel_start *url_index{url|http://tvguide.xinmsnent.mediacorp.sg/starterkit/servlet/page/xintv/guide} *index_site_id.scrub {regex||]*id="tvScheullTimeZone"[^>]*>\s*(]*value=".*?"[^>]*>.*?\s*)*||} *scope.range {(channellist)|end} *index_site_channel.modify {addstart|'index_site_id'} *index_site_channel.modify {substring(type=regex)|]*value=".+?"[^>]*>(.+?)} *index_site_id.modify {substring(type=regex)|]*value="(.+?)"[^>]*>.+?} *index_site_id.modify {cleanup(removeduplicates=equal,100 link="index_site_channel")} *end_scope ** @auto_xml_channel_end