100 lines
5.0 KiB
INI
Executable File
100 lines
5.0 KiB
INI
Executable File
**------------------------------------------------------------------------------------------------
|
|
* @header_start
|
|
* WebGrab+Plus ini for grabbing EPG data from TvGuide websites
|
|
* @Site: npo.nl
|
|
* @MinSWversion: V1.1.1/53.5
|
|
* @Revision 0 - [15/12/2013] Jan van Straaten
|
|
* - creation
|
|
* @Remarks: successor of gids.publiekeomroep.nl
|
|
* @header_end
|
|
**------------------------------------------------------------------------------------------------
|
|
* is this a monday to monday site?
|
|
*
|
|
site {url=npo.nl|timezone=UTC+01:00|maxdays=6|cultureinfo=nl-NL|charset=utf-8|titlematchfactor=50}
|
|
url_index{url()|http://www.npo.nl/gids/verticaal/|urldate|/content}
|
|
urldate.format {datestring|dd-MM-yyyy}
|
|
|
|
* The indexpage lists all channels (24) in hour blocks within each hour block
|
|
* the channel data is separated by <td .. </td> tags
|
|
* There is no channel_id inside the channel blocks, instead the channels are listed *in a fixed order,
|
|
* the first <td .. </td> channel block is Nederland 1 , the *second Nederland 2 etc.
|
|
* This order is the site_id of the channellist file.
|
|
* the next showsplit splits the index in hour blocks, 24 elements in one hour, one for each channel
|
|
index_showsplit.scrub {multi(exclude="padder right")|td class='padder left'></td>|<td|</td>|</table>}
|
|
index_variable_element.modify {addstart(scope=splitindex)|'config_site_id'}
|
|
index_showsplit.modify {substring(type=element)|'index_showsplit' 'index_variable_element' 1/24} * this selects the proper channel
|
|
index_showsplit.modify {select()|"<div class=\'time\'>" ~} *ignore empty shows
|
|
index_showsplit.modify {replace|</a>|</a>\|} * split in ndividual shows
|
|
|
|
<div class='time'>
|
|
index_urlshow {url()|http://www.npo.nl|<a href="||"|class=}
|
|
*index_urlchannellogo {url| }
|
|
*
|
|
scope.range {(indexshowdetails)|end}
|
|
index_start.scrub {single|<div class='time'>||</div>|</div>}
|
|
*index_stop.scrub {single|data-end-hour="||"|</div>}
|
|
*index_temp_1.scrub {single|data-end-minutes="||"|</div>}
|
|
*index_stop.modify {addend()|:'index_temp_1}
|
|
index_category.scrub {single(separator=",")|data-genre="||"|"}
|
|
index_title.scrub {single(separator=":" include=first)|<div class='program-title'>||</div>|</div>}
|
|
index_subtitle.scrub {single(separator=":" exclude=first)|<div class='program-title'>||</div>|</div>}
|
|
index_category.modify {replace(not "19")|9|nieuws/actualiteit}
|
|
index_category.modify {replace|10|amusement}
|
|
index_category.modify {replace|11|informatief}
|
|
index_category.modify {replace|12|religieus}
|
|
index_category.modify {replace|13|jeugd}
|
|
index_category.modify {replace|14|serie/soap}
|
|
index_category.modify {replace|15|overige}
|
|
index_category.modify {replace|16|documentaire}
|
|
index_category.modify {replace|17|sport}
|
|
index_category.modify {replace|18|misdaad}
|
|
index_category.modify {replace|19|kunst/cultuur}
|
|
index_category.modify {replace|20|erotiek}
|
|
index_category.modify {replace|21|animatie}
|
|
index_category.modify {replace|22|natuur}
|
|
index_category.modify {replace|23|comedy}
|
|
index_category.modify {replace|24|muziek}
|
|
index_category.modify {replace|25|film}
|
|
index_category.modify {replace|26|educatief}
|
|
index_category.modify {replace|27|gezondheid}
|
|
index_category.modify {replace|28|wetenschap}
|
|
index_category.modify {replace|30|kinderen 6-12}
|
|
index_category.modify {replace|31|drama}
|
|
index_category.modify {replace|32|kinderen 2-5}
|
|
index_category.modify {replace|33|klassiek}
|
|
end_scope
|
|
*
|
|
title.scrub {single()|<h1 title=|>|</h1>|</div>}
|
|
title.scrub {single()|<h1>||</h1><h2>|</div>} * alternative
|
|
title.modify {cleanup(removeduplicates)}
|
|
title.modify {addstart("")|'index_title'} * some detail page have no title!
|
|
title.modify {cleanup}
|
|
subtitle.scrub {single|<h1 title=|<h2>|</h2>|</div>}
|
|
subtitle.scrub {single|</h1><h2>||</h2>|</div>} *alternative
|
|
subtitle.modify {cleanup(removeduplicates)}
|
|
temp_1.scrub {single()|<h1 title=|(|)|</div>}
|
|
description.modify {addend('temp_1' not "")| ('temp_1')} * omroep
|
|
description.scrub {multi(include="name='description")|<meta content='||'>|'>}
|
|
description.modify {remove()|\' name=\'description}
|
|
rating.scrub {multi|<i class="npo-icon-kijkwijzer||">|</div>}
|
|
rating.modify {remove(type=regex)|"(\A.+title=\")"}
|
|
rating.modify {remove|age-}
|
|
*
|
|
** _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
|
|
** ##### CHANNEL FILE CREATION (only to create the xxx-channel.xml file)
|
|
**
|
|
** @auto_xml_channel_start
|
|
* site_id is set to the order nr in which the channels are listed on the index page
|
|
*scope.range {(channellist)|end}
|
|
*index_temp_2.scrub {multi()|<ul class='scroll-header'>|/>|</li>|</ul>}
|
|
*index_temp_2.modify {cleanup}
|
|
*index_site_channel.modify {addstart|'index_temp_2'}
|
|
*index_site_id.scrub {multi||||} * dummy needed to activate the channellist creation
|
|
*index_temp_1.modify {calculate(type=element format=F0)|'index_site_channel' #}
|
|
*loop {('index_temp_1' > "0" max=50)|end}
|
|
*index_temp_1.modify {calculate(format=F0)|1 -}
|
|
*index_site_id.modify {addstart|'index_temp_1'####}
|
|
*end_loop
|
|
*index_site_id.modify {replace|####|\|}
|
|
*end_scope
|
|
** @auto_xml_channel_end |