Commit Graph

16433 Commits

Author SHA1 Message Date
dirkf
1036478d13 [YouTube] Endure subtitle URLs are complete
* WEB URLs are, MWEB not
* resolves #33017
2025-01-06 01:39:04 +00:00
dirkf
00ad2b8ca1 [YouTube] Refactor subtitle processing
* move to internal function
* use `traverse-obj()`
2025-01-06 01:24:30 +00:00
dirkf
ab7c61ca29 [YouTube] Apply code style changes, trailing commas, etc 2025-01-06 01:22:16 +00:00
dirkf
176fc2cb00 [YouTube] Avoid early crash if webpage can't be read
* see issue #33013
2024-12-31 14:51:29 +00:00
dirkf
d55d1f423d [YouTube] Always extract using MWEB API client
* temporary fix-up for 403 on download
* MWEB parameters from yt-dlp 2024-12-06
2024-12-16 12:38:51 +00:00
dirkf
eeafbbc3e5 [YouTube] Fix signature function extraction for 2f1832d2
* `_` was omitted from patterns
* thx yt-dlp/yt-dlp#11801

Co-authored-by: bashonly
2024-12-16 12:38:51 +00:00
dirkf
cd7c7b5edb [YouTube] Simplify pattern for nsig function name extraction 2024-12-16 12:38:51 +00:00
dirkf
eed784e15f [YouTube] Pass nsig value as return hook, fixes player 3bb1f723 2024-12-16 12:38:51 +00:00
dirkf
b4469a0f65 [YouTube] Handle player 3bb1f723
* fix signature code extraction
* raise if n function returns input value
* add new tests from yt-dlp

Co-authored-by: bashonly
2024-12-16 12:38:51 +00:00
dirkf
ce1e556b8f [jsinterp] Add return hook for player 3bb1f723
* set var `_ytdl_do_not_return` to a specific value in the scope of a function
* if an expression to be returned has that value, `return` becomes `void`
2024-12-16 12:38:51 +00:00
dirkf
f487b4a02a [jsinterp] Strip /* comments */ when parsing
* NB: _separate() is looking creaky
2024-12-16 12:38:51 +00:00
dirkf
60835ca16c [jsinterp] Fix and improve "methods"
* push, unshift return new length
* impove edge cases for push/pop, shift/unshift, forEach, indexOf, charCodeAt
* increase test coverage
2024-12-16 12:38:51 +00:00
dirkf
94fd774608 [jsinterp] Fix and improve split/join
* improve split/join edge cases
* correctly implement regex split (not like re.split)
2024-12-16 12:38:51 +00:00
dirkf
5dee6213ed [jsinterp] Fix and improve arithmetic operations
* addition becomes concat with a string operand
* improve handling of edgier cases
* arithmetic in float like JS (more places need cast to int?)
* increase test coverage
2024-12-16 12:38:51 +00:00
dirkf
81e64cacf2 [jsinterp] Support multiple indexing (eg a[1][2])
* extend single indexing with improved RE (should probably use/have used _separate_at_paren())
* fix some cases that should have given undefined, not throwing
* standardise RE group names
* support length of objects, like {1: 2, 3: 4, length: 42}
2024-12-16 12:38:51 +00:00
dirkf
c1a03b1ac3 [jsinterp] Fix and improve loose and strict equality operations
* reimplement loose equality according to MDN (eg, 1 == "1")
* improve strict equality (eg, "abc" === "abc" but 'abc' is not 'abc')
* add tests for above
2024-12-16 12:38:51 +00:00
dirkf
118c6d7a17 [jsinterp] Implement typeof operator 2024-12-16 12:38:51 +00:00
dirkf
f28d7178e4 [InfoExtractor] Use kwarg maxsplit for re.split
* May become kw-only in future Pythons
2024-12-16 12:38:51 +00:00
dirkf
c5098961b0 [Youtube] Rework n function extraction pattern
Now also succeeds with player b12cc44b
2024-08-06 20:59:09 +01:00
dirkf
dbc08fba83 [jsinterp] Improve slice implementation for player b12cc44b
Partly taken from yt-dlp/yt-dlp#10664, thx seproDev
        Fixes #32896
2024-08-06 20:51:38 +01:00
Aiur Adept
71223bff39
[Youtube] Fix nsig extraction for player 20dfca59 (#32891)
* dirkf's patch for nsig extraction
* add generic search per  yt-dlp/yt-dlp/pull/10611 - thx bashonly

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2024-08-01 19:18:34 +01:00
dirkf
e1b3fa242c [Youtube] Find n function name in player 3400486c
Fixes #32877
2024-07-25 00:16:00 +01:00
dirkf
451046d62a [Youtube] Make n-sig throttling diagnostic up-to-date 2024-07-24 14:33:34 +01:00
dirkf
16f5bbc464 [YouTube] Fix nsig processing for player b22ef6e7
* improve extraction of function name (like yt-dlp/yt-dlp#10390)
* always use JSInterp to extract function code (yt-dlp/yt-dlp#10396, thx seproDev, pukkandan)
2024-07-11 00:50:46 +01:00
dirkf
d35ce6ce95 [jsinterp] Support functionality for player b22ef6e7
* support `prototype` for call() and apply() (yt-dlp/yt-dlp#10392, thx Grub4k)
* map JS `Array` to `list`
2024-07-11 00:50:46 +01:00
dirkf
76ac69917e [jsinterp] Further improve expression parsing (fix fd8242e)
Passes tests from yt-dlp
2024-07-11 00:50:46 +01:00
dirkf
756f6b45c7 [jsinterp] Re-align JSInterp and tests (esp.) with yt-dlp
Thx: various yt-dlp authors
2024-07-11 00:50:46 +01:00
bashonly
43a74c5fa5 [core] Address gaps in allowed extensions
Adds some extensions missing in 4652109643
(from yt-dlp/yt-dlp#10362)

Authored by: bashonly
Co-authored by: dirkf
2024-07-11 00:50:46 +01:00
dirkf
a452f9437c [core] Fix PR #32830 for fixed extensionless output template 2024-07-07 22:33:32 +01:00
unkernet
36801c62df
[YandexMusic] Save track version in the title field
PR #32837
* Add track version to track title
2024-07-07 20:18:33 +01:00
Sergey Musatov
f4b47754d9
[YandexMusic] Download music in High Quality (320 Kbit/s)
PR #31159
2024-07-06 11:04:36 +01:00
dirkf
37cea84f77 [core,utils] Support unpublicised --no-check-extensions 2024-07-02 15:38:50 +01:00
dirkf
4652109643 [core,utils] Implement unsafe file extension mitigation
* from https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-79w7-vh3h-8g4, thx grub4k
2024-07-02 15:38:50 +01:00
dirkf
3c466186a8 [utils] Back-port Namespace and MEDIA_EXTENSIONS from yt-dlp
Thx pukkandan
* Namespace: https://github.com/yt-dlp/yt-dlp/commit/591bb9d355
* MEDIA_EXTENSIONS: https://github.com/yt-dlp/yt-dlp/commit/8dc5930511
2024-07-02 15:38:50 +01:00
dirkf
4d05f84325 [PalcoMP3] Conform to new linter rule
* no space after @ in decorator
2024-06-20 20:03:49 +01:00
dirkf
e0094e63c3 [jsinterp] Various tweaks
* treat Infinity like NaN
* cache operator list
2024-06-20 20:03:49 +01:00
dirkf
fd8242e3ef [jsinterp] Fix and improve expression parsing
* improve BODMAS (fixes https://github.com/ytdl-org/youtube-dl/issues/32815)
* support more weird expressions with multiple unary ops
2024-06-20 20:03:49 +01:00
dirkf
ad01fa6cca [jsinterp] Add Debugger from yt-dlp
* https://github.com/yt-dlp/yt-dlp/commit/8f53dc4
* thx pukkandan
2024-06-20 20:03:49 +01:00
dirkf
2eac0fa379 [utils] Save orig_msg in ExtractorError 2024-06-20 20:03:49 +01:00
Paper
0153b387e5
[VidLii] Add 720p support (#30924)
* [VidLii] Add HD support  (yt-dlp backport-ish)

* Also fix a bug with the view count

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2024-06-11 13:21:39 +01:00
dirkf
a48fe7491d [ORF] Skip tests with limited availability 2024-06-11 12:52:13 +01:00
dirkf
e20ca543f0 [ORF] Re-factor and updateORFFM4StoryIE
* fix getting media via DASH instead of inaccessible mp4
* also get in-page YT media
2024-06-11 12:52:13 +01:00
dirkf
e39466051f [ORF] Support sound.orf.at, updating ORFRadioIE
* maintain support for xx.orf.at/player/... URLs
* add `ORFRadioCollectionIE` to support playlists in ORF Sound
* back-port and re-work `ORFPodcastIE` from https://github.com/yt-dlp/yt-dlp/pull/8486, thx Esokrates
2024-06-11 12:52:13 +01:00
dirkf
d95c0d203f [ORF] Support on.orf.at, replacing ORFTVthekIE
* add `ORFONIE`, back-porting yt-dlp PR https://github.com/yt-dlp/yt-dlp/pull/9113 and friends: thx HobbyistDev, TuxCoder, seproDev
* re-factor to support livestreams via new `ORFONliveIE`
2024-06-11 12:52:13 +01:00
dirkf
50f6c5668a [core] Re-factor with _fill_common_fields() as used in yt-dlp 2024-06-11 12:52:13 +01:00
dirkf
b4ff08bd2d [core] Safer handling of nested playlist data 2024-06-11 12:52:13 +01:00
kmnx
88bd8b9f87
[mixcloud] updated mixcloud API server address (#32557)
* updated mixcloud API server address
* fix tests
* etc

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2024-06-11 12:38:24 +01:00
dirkf
21924742f7 [InfoExtractor] Misc yt-dlp back-ports, etc
* add _yes_playlist() method
* avoid crash using _NETRC_MACHINE
* use _search_json() in _search_nextjs_data()
* _search_nextjs_data() default is JSON, not text
* test for above
2024-05-30 15:46:36 +01:00
dirkf
768ccccd9b [compat] Avoid type comparison in compat_ord
NB This isn't actually a compat fn; it should be utils.int_from_int_or_char
2024-05-30 15:46:36 +01:00
dirkf
eee9a247eb [utils] Split out traversal.py dummy and traversal tests 2024-05-30 15:46:36 +01:00
dirkf
34484e49f5 [compat] Improve compat_etree_iterfind for Py2.6
Adapted from https://raw.githubusercontent.com/python/cpython/2.7/Lib/xml/etree/ElementPath.py
2024-05-30 15:46:36 +01:00
dirkf
06da64ee51 [utils] Update traverse_obj() from yt-dlp
* remove `is_user_input` option per https://github.com/yt-dlp/yt-dlp/pull/8673
* support traversal of compat_xml_etree_ElementTree_Element per https://github.com/yt-dlp/yt-dlp/pull/8911
* allow un/branching using all and any per https://github.com/yt-dlp/yt-dlp/pull/9571
* support traversal of compat_cookies.Morsel and multiple types in `set()` keys per https://github.com/yt-dlp/yt-dlp/pull/9577
thx Grub4k for these
* also, move traversal tests to a separate class
* allow for unordered dicts in tests for Py<3.7
2024-05-30 15:46:36 +01:00
dirkf
668332b973 [YouPorn] Add playlist extractors
* YouPornCategoryIE
* YouPornChannelIE
* YouPornCollectionIE
* YouPornStarIE
* YouPornTagIE
* YouPornVideosIE,
2024-04-22 01:34:26 +01:00
dirkf
0b2ce3685e [YouPorn] Improve extraction
* detect unwatchable videos
* improve duration extraction
* fix count extraction and support large values
* detect and remove SEO spam boilerplate description
2024-04-22 01:34:26 +01:00
dirkf
eb38665438 [YouPorn] Incorporate yt-dlp PR 8827
* from https://github.com/yt-dlp/yt-dlp/pull/8827
* extract from webpage instead of broken API URL
* thx The-MAGI
2024-04-22 01:34:26 +01:00
dirkf
e0727e4ab6 [postprocessor/ffmpeg] Fix finding ffprobe (bug in 21792b8)
Fixes 21792b88b7 (commitcomment-140705274), thx: vonProteus
2024-04-07 15:33:30 +01:00
Ori Avtalion
4ea59c6107
[utils] Fix crash in _report_ignoring_subs from c58b655 (#32762)
Align `utils.bug_reports_message()` with yt-dlp https://github.com/yt-dlp/yt-dlp/commit/5873d4ccdd, thanks fstirlitz

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2024-04-05 15:25:29 +01:00
dirkf
21792b88b7 [external/FFmpeg] Fix and improve --ffmpeg-location handling
* pass YoutubeDL (FileDownloader) to FFmpegPostProcessor constructor
* consolidate path search in FFmpegPostProcessor
* make availability of FFmpegFD depend on existence of FFmpegPostProcessor
* detect ffmpeg executable on instantiation of FFmpegFD
* resolves #32735
2024-03-27 13:11:17 +00:00
dirkf
d8f134a664 [downloader/external] Fix "Resource Warning" in downloader test
* add compat_subprocess_Popen context manager
* apply context manager in FFmpegFD._call_downloader()
2024-03-27 13:11:17 +00:00
dirkf
31a15a7c8d [compat] Simplify/fix compat_html_parser_HTMLParseError 2024-03-27 13:11:17 +00:00
dirkf
19dc10b986 [utils] Apply compat_contextlib_suppress 2024-03-27 13:11:17 +00:00
dirkf
182f63e82a [compat] Add compat_contextlib_suppress
with compat_contextlib_suppress(*Exceptions):
    # code that fails silently for any of Exceptions
2024-03-27 13:11:17 +00:00
gy-chen
71211e7db7
[Youtube] Fix unwanted private method __ie_msg in f8b0135850
Fixes `AttributeError no attribute '_YoutubeIE__ie_msg'` if unable to decode n-parameter
2024-03-23 15:30:13 +00:00
Zizheng Guo
a96a45b2cd
[Vimeo] Improve config extraction (#32742)
* update for more robust json parsing
2024-03-12 11:44:13 +00:00
hatsomatt
820fae3b3a [Videa] Fix extraction
* update API URL
* from https://github.com/yt-dlp/yt-dlp/pull/8003
* thanks to the authors!

Closes yt-dlp/7427
Authored by: hatsomatt, aky-01
2024-03-08 13:14:52 +00:00
dirkf
aef24d97e9 [Videa] Align with yt-dlp 2024-03-08 13:14:52 +00:00
dirkf
f7b30e3f73 [XFileShare] Update extractor for 2024
* simplify aa_decode()
* review and update supported sites and tests
* in above, include FileMoon.sx, and remove separate module
* incorporate changes from yt-dlp
* allow for decoding multiple scripts (eg, FileMoon)
* use new JWPlayer extraction
2024-03-08 13:03:42 +00:00
dirkf
f66372403f [InfoExtractor] Rework and improve JWPlayer extraction
* use traverse_obj() and _search_json()
* support playlist `.load({**video1},{**video2}, ...)`
* support transform_source=... for _extract_jwplayer_data()
2024-03-08 13:03:42 +00:00
dirkf
7216fa2ac4 [InfoExtractor] Add _search_json()
* uses the error diagnostic to truncate the JSON string
* may be confused by non-C-Pythons
2024-03-08 13:03:42 +00:00
dirkf
acc383b9e3 [utils] Let int_or_none() accept a base, like int() 2024-03-08 13:03:42 +00:00
Hubert Hirtz
f0812d7848
[utils] Handle user:pass in URLs (#28801)
* Handle user:pass in URLs

Fixes "nonnumeric port" errors when youtube-dl is given URLs with
usernames and passwords such as:

    http://username:password@example.com/myvideo.mp4

Refs:
- https://en.wikipedia.org/wiki/Basic_access_authentication
- https://tools.ietf.org/html/rfc1738#section-3.1
- https://docs.python.org/3.8/library/urllib.parse.html#urllib.parse.urlsplit

Fixes #18276 (point 4)
Fixes #20258
Fixes #26211 (see comment)

* Align code with yt-dlp

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2024-03-04 01:27:55 +00:00
Aaron Tan
40bd5c1815
[caffeine.tv] Add new extractor (#32514)
* Add CaffeineTVIE info extractor to support site caffeine.tv

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2024-02-22 12:54:07 +00:00
dirkf
70f230f9cf
[GBNews]Add new extractor for GB News TV channel (#29432)
* Add extractor for GB News TV channel

* Support more GBNews URL formats
Allow alphanumeric and _ in place of `shows`, which redirect to site's preferred URL

* Update for 2024
2024-02-22 12:44:00 +00:00
dirkf
48ddab1f3a
[downloader/external] Fix WgetFD proxy (rev 2)
From PR (defunct source), closes #29343.
Matches https://github.com/yt-dlp/yt-dlp/pull/3152
Thx former user kikuyan.
2024-02-21 16:29:08 +00:00
dirkf
7687389f08 [Vbox7] Improve extraction, adding features from yt-dlp PR #9100
* changes from https://github.com/yt-dlp/yt-dlp/pull/9100 (thx
seproDev):
  - attempt HLS extraction
  - re-enable XFF
  - test `view_count`, `duration` extraction
* improve commenting, error checks
2024-02-19 00:53:22 +00:00
dirkf
4416f82c80 [Vbox7IE] Sanitise ld+json containing unexpected characters
* based on PR #29680
* added hack to force invoking `transform_source`
* fixes #26218
2024-02-02 12:36:05 +00:00
dirkf
bdda6b81df [Vbox7IE] Improve extraction
* DASH extraction no longer fails with new range support
* but always find combined formats if available
* suppress ineffective XFF geo-bypass (causes time-outs)
* adapted from https://github.com/ytdl-org/youtube-dl/pull/29680
* thx former GH user kikuyan
2024-02-02 12:36:05 +00:00
dirkf
1fd8f802b8 [InfoExtractor] Correctly resolve BaseURL in DASH manifest
Specs:
* ISO/IEC 23009-1:2012 section 5.6
* RFC 3986 section 5.
2024-02-02 12:36:05 +00:00
dirkf
4eaeb9b2c6 [InfoExtractor] Support byte range for DASH
* adapted from https://github.com/ytdl-org/youtube-dl/pull/30279
* thx former GH user kikuyan
2024-02-02 12:36:05 +00:00
dirkf
bec9180e89 [downloader/dash] Support range in fragment (format f'{start}-{end}')
* adapted from https://github.com/ytdl-org/youtube-dl/pull/30279
 * thx former GH user kikuyan
2024-02-02 12:36:05 +00:00
dirkf
c58b655a9e [InfoExtractor] Support DASH subtitle extraction (yt-dlp back-port) 2024-02-02 12:36:05 +00:00
dirkf
dc512e3a8a [YouTube] Fix like_count extraction using likeButtonViewModel
* also fix various tests
* TODO: check against yt-dlp tests
2024-01-22 11:10:34 +00:00
dirkf
f8b0135850 [YouTube] Rework n-sig processing, realigning with yt-dlp
* apply n-sig before chunked fragments, fixes #32692
2024-01-22 11:10:34 +00:00
dirkf
640d39f03a [InfoExtractor] Support some warning and ._downloader shortcut methods from yt-dlp 2024-01-22 11:10:34 +00:00
dirkf
6651871416 [compat] Rework compat for method parameter of compat_urllib_request.Request constructor
* fixes #32573
* does not break `utils.HEADrequest` (eg)
2024-01-22 11:10:34 +00:00
mk-pmb
be008e657d [core] Fix format string injection for metadata JSON filename message. 2023-12-06 02:45:41 +00:00
Robotix
b1bbc1e502
[Epidemic Sound] Add new extractor (#32628)
* Add simple extractor
* Support separate tracks
* Use index as id instead of slug

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-12-06 01:17:57 +00:00
dirkf
55a442adae
[Imgur] Overhaul extractor module (#32612)
Revise extractors for new API and page formats
2023-12-05 20:02:30 +00:00
mimvahedi
c62936a5f2
[telewebion] Fix extraction (#32634)
* [telewebion] fix extraction

Resolves https://github.com/ytdl-org/youtube-dl/issues/5135#issuecomment-932952119

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-12-02 15:25:09 +00:00
dirkf
427472351c [utils] Make restricted filenames ignore characters in Unicode categories Mark, Other
Resolves #32629
2023-11-29 22:08:01 +00:00
ReenigneArcher
b7fca0fab3 [Youtube] Update consent cookie handling to match site
Apologies for force push!
[skip ci]
2023-11-29 21:43:02 +00:00
dirkf
00ef748cc0 [downloader] Fix baa6c5e: show ETA of http download as ETA instead of total d/l time 2023-09-24 22:07:47 +01:00
dirkf
66ab0814c4 [utils] Revert bbd3e7e, updating docstring, test instead 2023-09-03 23:15:19 +01:00
dirkf
bbd3e7e999 [utils] Properly handle list values in update_url()
An actual list value in a query update could have been treated
as a list of values because of the key:list parse_qs format.
2023-09-03 01:18:22 +01:00
dirkf
31f50c8194 [S4C] Add thumbnail extraction, extract series as playlist
Based on https://github.com/yt-dlp/yt-dlp/pull/7776: thx ifan-t, bashonly
2023-08-31 23:16:50 +01:00
dirkf
86e3cf5e58 [S4C] Add extractor for Sianel Pedwar Cymru
* from https://github.com/yt-dlp/yt-dlp/pull/7730, thx ifan-t, bashonly
2023-08-04 22:54:12 +01:00
dirkf
2efc8de4d2 [utils] Advertise optional supported Content-Encodings 2023-08-01 01:05:09 +01:00
dirkf
e4178b5af3 [utils] Add and use filter_dict() from yt-dlp 2023-08-01 01:05:09 +01:00
dirkf
2d2a4bc832 [utils] Revise isinstance() tests (especially for str/unicode/bytes) to complete Linter fix 2023-08-01 01:05:09 +01:00
dirkf
7d965e6b65 [utils] Avoid comparing type(var), etc, to pass new Linter rules 2023-08-01 01:05:09 +01:00