93 lines
4.9 KiB
INI
Executable File
93 lines
4.9 KiB
INI
Executable File
**------------------------------------------------------------------------------------------------
|
|
* @header_start
|
|
* WebGrab+Plus ini for grabbing IMDB data from TvGuide websites
|
|
* @MinSWversion: V3.0.0
|
|
* @Site: imdb.com, primary search with ask.com
|
|
* @Revision 13 - [05/04/2020] Jan van Straaten
|
|
* - full rewrite
|
|
* @Revision 12 - [08/10/2017] Jan van Straaten
|
|
* - added actor role
|
|
* @Revision 11 - [22/05/2016] Jan van Straaten
|
|
* - added mdbinitype
|
|
* - fixed star-rating and original title
|
|
* @Revision 10 - [07/12/2015] Jan van Straaten
|
|
* - change element names, mdb_show_id and mdb_episode_id
|
|
* @Revision 9 - [25/09/2015] Jan van Straaten
|
|
* - added mdb_category
|
|
* @Revision 8 - [10/10/2014] Jan van Straaten
|
|
* - improved showid scub, also numbers upto 5000000 (was 2500000)
|
|
* @Revision 7 - [09/06/2014] Jan van Straaten
|
|
* - added url header
|
|
* @Revision 6 - [07/06/2014] Jan van Straaten/Jagad
|
|
* - added mdb_showicon
|
|
* @Revision 5 - [20/12/2013] Jan van Straaten
|
|
* - changes in (aka)titles
|
|
* @Revision 4 - [23/11/2013] Jan van Straaten
|
|
* - changes in actor and director due to site changes
|
|
* @Revision 3 - [11/08/2013] Jan van Straaten
|
|
* - small changes in commentsummary due to imdb.com changes
|
|
* @Revision 2 - [16/02/2013] Jan van Straaten
|
|
* - small changes in actor due to imdb.com changes
|
|
* @Revision 1 - [14/04/2012] Jan van Straaten
|
|
* - correction in production date
|
|
* @Remarks: none
|
|
* @header_end
|
|
**------------------------------------------------------------------------------------------------
|
|
site {url=imdb.com|mdbinitype=movie|cultureinfo=en-GB|charset=UTF-8|matchfactor=60|searchsite=ask}
|
|
*
|
|
scope.range {(primarysearch)|end}
|
|
* primary search:
|
|
url_primarysearch {url(urlencode=1,2,3,4)|https://www.ask.com/web?&q=|imdb+|'title'|+|'credit'|&/NCR}
|
|
mdb_show_id.scrub {regex|primary||/tt(\d{7})/||}
|
|
|
|
* imdb url's:
|
|
url_mdb_p1 {url()|primary|https://www.imdb.com/title/tt|mdb_show_id|/}
|
|
url_mdb_p2 {url|primary|https://www.imdb.com/title/tt|mdb_show_id|/plotsummary}
|
|
url_mdb_p3 {url|primary|https://www.imdb.com/title/tt|mdb_show_id|/releaseinfo#akas}
|
|
url_mdb_p4 {url|primary|https://www.imdb.com/title/tt|mdb_show_id|/reviews}
|
|
url_mdb_p5 {url|primary|https://www.imdb.com/title/tt|mdb_show_id|/fullcredits#cast}
|
|
*url_mdb_p6 {url|primary|https://www.imdb.com/title/tt|mdb_show_id|/criticreviews?ref_=tturv_sa_5
|
|
*
|
|
url_mdb.headers {customheader=Accept-Encoding=gzip,deflate,br}
|
|
|
|
end_scope
|
|
*imdb elements
|
|
scope.range {(match)|end} * musthaves
|
|
mdb_title.scrub {single|p3|<td class="aka-item__name"> (original title)</td>|<td class="aka-item__title">|</td>|</tr>}* original title
|
|
mdb_title.scrub {single(separator=" - " exclude="IMDb" include=first)|p1|<head>|<title>|(|</title>} * OK
|
|
mdb_title.scrub {regex|p3||<td class="aka-item__title">(.+?)</td>||} *aka's *OK
|
|
end_scope
|
|
scope.range {(getelements)|end}
|
|
mdb_actor.scrub {regex|p5||name/nm\d+?/\?ref_=ttfc_fc_cl_t\d+?\"\s>(.+?)\s+?</a>.+?<td class=\"character\">\s+?(.+?)\s+?</td>||}
|
|
mdb_actor.modify {substring(type=element)|0 24} * keep the first 12 actors (double because still name|role format
|
|
mdb_actor.modify {remove(type=regex)|"<a href=\"/title/tt\d+?/characters/nm\d+?\?ref_=ttfc_fc_cl_t\d+?\" >"}
|
|
mdb_actor.modify {replace|\||###}
|
|
mdb_actor.modify {replace(type=regex)|"(###\s{5,})\w+"| (role=}
|
|
mdb_actor.modify {replace|###|\|}
|
|
** the next is only necessary if actor still contains 9 or more spaces (after the role name)
|
|
** some p5 pages have a very simple actor listing without these spaces, so we add them
|
|
mdb_actor.modify {addend| }
|
|
mdb_actor.modify {substring(type=regex)|"\A\s(.+?\s\(role=.+?)\s{9}"}
|
|
mdb_actor.modify {remove|</a>}
|
|
mdb_actor.modify {addend(not "")|)}
|
|
*
|
|
mdb_director.scrub {multi|p1|"director":|"name": "|"|\},}
|
|
mdb_director.scrub {multi|p5|/?ref_=ttfc_fc_dr|"\r> |</a>|</td>} * fulllist
|
|
mdb_director.modify {substring(type=element)|0 5} * keep the first 6
|
|
*
|
|
mdb_productiondate.scrub {single|p1|<title>||</title>|</title>}
|
|
mdb_category.scrub {multi|p1|genres&ref_=tt_ov_inf|>|</a>|</a>}
|
|
mdb_category.modify {cleanup(removeduplicates)}
|
|
*
|
|
mdb_description.scrub {single|p1|<meta name="description"|content="|" />|<meta}
|
|
mdb_showicon.scrub {single|p1|Poster"|src="|"|"image" />}
|
|
mdb_starrating.scrub {single()|p1|<div class="ratingValue">|itemprop="ratingValue">|</span>|</div>}
|
|
mdb_starratingvotes.scrub {single|p1|<div class="ratingValue">|based on|user ratings|</div>}
|
|
mdb_country.scrub {regex|p1||<h4 class=\"inline\">Release Date:</h4>.+?\((.+?)\)||}
|
|
mdb_plot.scrub {single|p2|id="plot-summaries-content">|<p>|</p>|</ul>}
|
|
mdb_commentsummary.scrub {multi(max=5 excludeblock="Warning: Spoilers")|p4|class="title" >||</a>|<div class="ipl-expander ">}
|
|
mdb_review.scrub {multi(max=1 excludeblock="Warning: Spoilers")|p4|class="title" >|<div class="text show-more__control">|</div>|<div class="actions text-muted">}
|
|
mdb_review.modify {cleanup}
|
|
end_scope
|
|
|
|
* |