892 views
 owned this note
2024-04-24 21:00 (9pm) === :::info ***Next meeting: 2024-05-06 18:00 CET (6pm)*** ::: # Episode identification :::info **Current state: https://pad.funkwhale.audio/s/T-yx14DsH#Episodes-endpoint** **Last meeting: https://pad.funkwhale.audio/s/6mWuDexgz#Episode-GUIDs** ::: | test case βœ–βœ” | πŸŽ‰Static fetch-hash / generated GUIDπŸŽ‰ | Dynamic fetch-hash | GUID from the feed | | :--- | :---: | :---: | :---: | | **rss feed without episode guids (old)**<br>IDs need to be generated once | βœ” | βœ” | βœ” | | **rss feed with 2 duplicate guids (old)**<br>~~We would see them as the same episode in each case - but we'd accept this (consider them effectively the same episde, and sync changes between the 'two' episodes)~~ The client can decide whether the episodes are the same or whether they are different - if they are different, GUIDs from the feed cannot be used!! | βœ” | βœ” | βœ– | | **guid changes for a given episode in the rss feed (old)**<br>Deduplication endpoint necessary, deduplicate client-side? | βœ” | βœ– | βœ” | **New test case 1:** * listen to episode * phone dies * podcast publisher changes episode guid * we get a new phone & connect to the database again; it gets everything from the server --> server doesn't know of new guid, and phone doesn't know of old one Here we'd get a duplicate. When the client sends the 'create episode' call to the server, the server recognises the possible duplicate and 1. creates the episode 2. responds to the client with a question: do you want to deduplicate? [Discussion about deduplication from 2023-05-30](https://pad.funkwhale.audio/oCfs5kJ6QTu02d_oVHW7DA#Deduplication) What's the expectation regarding the guids to align? We have discussed a deduplciation endpoint, this would be a good use-case for that. How can client know how to deduplicate, if it doesn't receive any metadata from the server, only a guid? Impossible to compare with RSS feed if it has nothing to compare with. AP: title, if mostly similar then check media type, date & duration. If those are the same as well, then we identify as duplicate. But title usually the most useful one; media file URL (enclosure) will most likely change. What data should the server store for deduplication? 1. Title 2. Release date 3. Enclosure URL? 4. Duration? Other question, for future: which fields should be checked for similarity (when the client synchronises) - to avoid noise. **New test case 2:** * podcaster publishes both DE & EN episodes. In 3 feeds: EN-only, DE-only, mixed. Episode guids created as different ones between * What if you switch from DE-only to EN-only feed? Deduplication MUST always only happen within a single feed, not across feeds. If the feed URL changes, it doesn't necessarily mean that the feed's GUID changes. AP: * if podcast publisher knows what they're doing and has redirect, episode GUIDs remain the same * if not: * if episode guid in the feed is still the same, we keep as is * if episode guids in feed are different, we would consider them as 'new' episodes and execute deduplication **New test case 3:** If you switch, within same feed/subscription, from type a (e.g. video) to type b (e.g. audio). **Question about how server stores episode data:** If you merge episodes, this is per-user action. This means that there is duplication between users? Indeed. Then extra episode metadata It would affect storage space. Database structuring can be done smartly; e.g. have single table per user with all data, or split the data between multiple tables. As long as API specs are respected, server can do whatever. E.g. single-user server would not separate this all, if you're doing a multi-user database you'll want to be smarter. If in the latter case, deduplication might create a single new episode that's only used by one user. There's multiple ways to do it. We can put a preamble recommendation to server implementers to do Who is responsible for calculating/creating the spec episode GUID? * [2023-07-11 episode data fields - server can calculate GUID](https://pad.funkwhale.audio/FkuIqtPGT-ynYKqBieHffw#Data) * have to specify: has to be 'globally' unique within a feed * probably server should when receiving instructions from client to create new episode or patch one, and it sends an updated GUID, it should check it is actually unique within the feed * We should establish clearly separate naming; separate from feed GUID. Proposal: **sync-UUID** Conclusion:(?) we want to keep a separate sync guid? Yes, because: * when clients have different deduplication standards * it can help multi-user servers to have single episode entries and use guid for sync. (Although if allow the client to generate sync-UUIDs, they would become unique across the server.) **Question: informing server if client finds new episode** Really needed to tell server about episode as soon as it was found? Or can it be only when there's a meaningful action (e.g. play). That would save many calls to the server, and sync-IDs that have to be stored in the client. Potential problem: if episode was played already on client A, and then client pu -> Solution: allow client to request server to provide 'changed since' data. Should we support feeds who don't keep historical data? E.g. episodes that are no longer in feed but already 'Downloaded' status should also be considered an action that acts as trigger to inform the server of the episode. Conclusion: * don't push all episodes to sync server * if client syncs episode timestamp, but doesn't know the episode sync-ID yet, then client needs a way to request additional information about the episode to locally match received sync ID and local episodes * should have 'native' calls "I played episode but don't have a sync ID yet" and "I played episode with this sync ID" * **"lazy synchronization"**: only sync episode data after interaction with episode -> sync-ID fields stay empty until interaction happens