epg/siteini.pack/MDB postprocessor/imdb.com.imdb_series.ini

**------------------------------------------------------------------------------------------------
* @header_start
* WebGrab+Plus ini for grabbing IMDB data from TvGuide websites
* @MinSWversion: V1.1.1/56.25
*     -  (postprocess V2.0)
* @Site: imdb.com
* @Revision 7 - [25/05/2016] Jan van Straaten
* 	- support for match on episode-num or sub-title, use of new scopes
*	- added mdbinitype
* @Revision 6 - [12/02/2016] Jan van Straaten
* 	- fix of the episode and productiondate
* @Revision 5 - [07/12/2015] Jan van Straaten
*   - change element names, mdb_show_id and mdb_episode_id
* @Revision 4 - [30/09/2015] Jan van Straaten
*     - added mdb-category
* @Revision 3 - [11/08/2014] Jan van Straaten
*     - improved mdb_episode_id selection
* @Revision 2 - [09/06/2014] Jan van Straaten
*	  - added url header
* @Revision 1 - [24/11/2013] Jan van Straaten
*     - version check enabled
* @Revision 0 - [09/11/2013] Jan van Straaten
*     - creation
* @Remarks: Series data extraction. English version
* @header_end
**------------------------------------------------------------------------------------------------
*
*
site {url=imdb.com|mdbinitype=serie|cultureinfo=en-GB|charset=UTF-8|matchfactor=70|searchsite=imdb|episodesystem=xmltv_ns}
scope.range {(primarysearch)|end}
* primary search (using imdb's advanced search):
url_primarysearch {url()|http://www.imdb.com/search/title?&title=|'title'|&title_type=tv_series}
url_primarysearch.modify {replace| |%20}
*http://www.imdb.com/search/title?title=Touched%20by%20an%20Angel&title_type=tv_series
url_primarysearch.headers {customheader=Accept-Encoding=gzip,deflate}
mdb_show_id.scrub {multi|primary|<span class="wlb_wrapper"|<a href="/title/tt|/">|</a>}
*
* imdb url's:
url_mdb_p1.modify {addstart()|http://www.imdb.com/title/tt'mdb_show_id'/epdate} * all the episodes date sorted with episode title and mdb_episode_id
url_mdb_p2.modify {addstart()|http://www.imdb.com/title/tt'mdb_episode_id'} * the episode detail page
* or http://www.imdb.com/title/tt0553267/?ref_=tt_ep_pr = same
url_mdb_p3.modify {addstart|http://www.imdb.com/title/tt'mdb_episode_id'/synopsis?ref_=tt_stry_pl} * the full synopsis
* is same as * http://www.imdb.com/title/tt2288518/synopsis full synopsis
url_mdb_p4.modify {addstart|http://www.imdb.com/title/tt'mdb_episode_id'/fullcredits?ref_=tt_ql_1} *full cast and crew (director, writer, actor)
* http://www.imdb.com/title/tt2288518/fullcredits?ref_=tt_ql_1
url_mdb_p5.modify {addstart|http://www.imdb.com/title/tt'mdb_episode_id'/plotsummary?ref_=tt_ql_5} *plot summary (not used)
url_mdb_p6.modify {addstart|http://www.imdb.com/title/tt'mdb_episode_id'/reviews?ref_=tt_ql_7} *user reviews
*
url_mdb.headers {customheader=Accept-Encoding=gzip,deflate}
end_scope
*
scope.range {(match)|end}
* imdb elements
* possible mustmatch elements
* episodetitle (sub-title)
mdb_episodetitlelist.scrub {multi()|p1|<a href="/title/tt|/">|</a>|</td>}
* episode-num
mdb_episodenumlist.scrub {regex(pattern="'S1'.'E1'")|p1||<td align="right" bgcolor="#eeeeee">(.+?)</td>||}
*
mdb_title.scrub {single(separator="(" include=first exclude="</span>")|p1|<span class="title-extra">||<i>|</span>} * the original title
mdb_title.scrub {single(separator="(" include=first)|p1|<title>||</title>|</title>}
mdb_title.modify {cleanup(tags="/=\"")} * removes starting "
mdb_title.modify {cleanup(tags="\"=/")}
*
** aka's not yet implemented if at all possible
*mdb_title.scrub {multi(separator=" - ")|p3|<h5><a name="akas">Also Known As (AKA)</a></h5>|<tr>\n<td>|</td>|</table>} *aka's
*mdb_title.scrub {multi|p3|<h5><a name="akas">Also Known As (AKA)</a></h5>|<tr>\n<td>|</td>|</table>} *aka's
*
mdb_temp_6.scrub {regex()|p1||<tr>\s+?(<td align="right".+?)</tr>||} * all the episodes *
end_scope
*
scope.range {(getelements)|end}
* in case of matched subtitle
mdb_temp_1.modify {calculate('mdb_episodetitlelist' not "" type=element format=F0)|'mdb_episodetitlelist' 'mdb_subtitle' @} * index of the episode
* in case of matched episodenum
mdb_temp_1.modify {calculate('mdb_episodenumlist' not "" type=element format=F0)|'mdb_episodenumlist' 'mdb_episode' @} * index of the episode
mdb_temp_1.modify {substring(type=element)|'mdb_temp_6' 'mdb_temp_1' 1} * the episode in xml format
*
* elements from mdb_temp_1 (the episode)
** get the mdb_episode_id
mdb_episode_id.modify {substring(type=regex)|'mdb_temp_1' "(\d\{7\})/\">"} * get the tt nbr for the episode
*
* the following elements are taken from the episode detail page mdb-p2
* there is a story line, director and actors , starrating, episodenum
* also a 'full synopsys' on a separate page mdb-p3 ??
*
** full productiondate
mdb_productiondate.scrub {single()|p2|<meta itemprop="datePublished"|content="|"|/>}
mdb_productiondate.modify {calculate(format=productiondate)} * only year allowed!
mdb_category.scrub {regex()|p2||<span class=\"itemprop\" itemprop=\"genre\">(.+?)</span></a>||}
mdb_actor.scrub {multi()|p4|?ref_=ttfc_fc_cl_t|itemprop="name">|</span>|</a>}
mdb_director.scrub {multi|p4|?ref_=ttfc_fc_dr|" >|</a>|</td>}
mdb_starrating.scrub {single()|p2|<div class="ratingValue">|itemprop="ratingValue">|</span>|</div>}
mdb_starratingvotes.scrub {single|p2|<div class="ratingValue">|based on|user ratings|</div>}
mdb_showicon.scrub {single|p2|Poster"|src="|"|"image" />}
mdb_commentsummary.scrub {multi(exclude="SPOILERS ARE INCLUDED""This review may contain spoilers""Add another review" include=first)|p6|<a href="reviews-index?">|<h2>|</h2>|Add another review}
mdb_review.scrub {multi(exclude="SPOILERS ARE INCLUDED""This review may contain spoilers""Add another review" include=first)|p6|<a href="reviews-index?">|<p>|</p>\n\n|Add another review}
mdb_plot.scrub {single(separator="<em" include=first)|p2|<h2>Storyline</h2>|<p>|</p>|</div>}
mdb_description.scrub {regex|p2||<meta name=\"description\" content=\"(.+?)\" />||}
*
* subtitle when not already done with episodetitlelist
mdb_subtitle.modify {substring("" type=regex)|'mdb_temp_1' "<td><a href=\"/title/tt\d+?/\">(.+?)</a></td>"}
* episode must be last because it is used to get mdb_temp_1  (the actual episode data from mdb_temp_6)
mdb_episode.modify {clear}
loop {('mdb_episode' "" max=1)|end}
mdb_episode.modify {substring(type=regex)|'mdb_temp_1' "<td align=\"right\" bgcolor=\"#eeeeee\">(.+?)</td>"}
mdb_temp_3.modify {substring(type=regex)|'mdb_episode' "\.(\d*)"} * episode part
mdb_episode.modify {substring(type=regex)|'mdb_episode' "(\d*)\."} * the season part
* onsceen format
mdb_episode.modify {addstart(not "")|S}
mdb_episode.modify {addend('mdb_temp_3' not "")|E'mdb_temp_3'}
* convert to xmltv_ns
*mdb_temp_3.modify {calculate(not "" format=F0)|1 -}
*mdb_episode.modify {substring(type=regex)|'mdb_episode' "(\d*)\."} * the season part
*mdb_episode.modify {calculate(not "" format=F0)|1 -}
*mdb_episode.modify {addend()|.'mdb_temp_3'.}
end_loop
end_scope