throbber
UNI
`TED STATES PATENT AND TRADEMARK OFFICE
`BE
`FORE THE PATENT TRIAL AND APPEAL BOARD
`GOOGL
`E LLC,
`Petitioner,
`v.
`C
`ELLULAR SOUTH, INC.,
`Patent Owner.
`C
`ase IPR2025-00877
`Patent 11,126,853 B2
`Issue Date: September 21, 2021
`Title
`: VIDEO TO DATA
`DE
`CLARATION OF HENRY HOUH, PH.D.
`PETITION FOR INTER PARTES REVIEW
`IPR2025-00877
`Patent Owner 2013
`Page 1 of 70
`
`
`
`
`
`
`
`* * *
`
`CB. Ground 1: Claims 1-4 Are Obvious Over Zhao in View of Kritt,
`Steinberg, and Kouzani
`1. Independent Claim 1: “A system for generating data from a
`video, comprising:” (Claim 1[pre])
`93. The preamble of claim 1 recites “[a] system for generating data from
`a video.” Assuming the preamble provides a claim limitation, Zhao discloses it.
`Zhao discloses a system that performs detection and recognition of objects, such as
`people and faces, in video images to generate data, including by creating and
`updating object models used for detection and recognition, and by generating data
`about the video that can be used for indexing and retrieval. (Houh, ¶¶93-94.) Figure
`2 of Zhao, discussed which I wi ll discuss in more detail in the my analysis below,
`illustrates at a high level this overall process of generating data from a video
`performed by the system:
`
`IPR2025-00877
`Patent Owner 2013
`Page 2 of 70
`
`
`
`
`
`
`
`
`(Zhao, Fig. 2; see also id., ¶0036 (“FIG. 2 illustrates a flowchart describing the steps
`involved, from a high-level, in one embodiment of the present system for detecting,
`modeling, recognizing, and tracking object images throughout one or more
`videos.”).)
`94. Zhao explains, for example, that it “generally relate[s] to systems and
`methods for detection, modeling, recognition, and tracking of objects within video
`content” and to “indexing and retrieval systems for videos based on generated object
`models.” (Zhao, ¶0051.) “These objects include people, faces, articles of clothing,
`IPR2025-00877
`Patent Owner 2013
`Page 3 of 70
`
`
`
`
`
`
`
`plants, animals, machinery, electronic equipment, food, and virtually any other type
`of image that can be captured or presented in video.” (Zhao, ¶0051.) Zhao’s system
`“may be operated in a computer environment” and “any results or outputs relating
`to dete ction, modeling, recognition, indexing, and/or tracking of object images
`within videos may be stored in a database” or otherwise output. (Zhao, ¶0055.) Zhao
`states that its system is “useful for a wide variety of applications, including video
`indexing and retrieval, video surveillance and security, unknown person
`identification, advertising, and many other fields.” (Zhao, ¶0053.)
`95. Therefore, Zhao discloses “ a system for generating data from a
`video.” And as explained below, the system as claimed is obvious over Zhao, Kritt,
`Steinberg, and Kouzani.
`(a). “a coordinator communicatively coupled to a splitter
`and to a plurality of demultiplexer nodes, wherein the
`splitter is configured to segment the video, wherein the
`demultiplexer nodes are configured to extract audio
`files from the video and to extract still frame images
`from the video;” (Claim 1[a])
`96. As explained below, claim 1[a] is disclosed by and obvious over Zhao
`in view of Kritt. (Houh, ¶¶96-113.)
`97. For context, the ’853 patent describes a “coordinator,” “splitter,” and
`“demultiplexer” in terms of functions they may perform, and does not state any
`particular form or structure. (Houh, ¶97.) The ’853 patent states, for instance, that
`“[s]ource media can be provided to a coordinator, which can be a non- visual
`IPR2025-00877
`Patent Owner 2013
`Page 4 of 70
`
`
`
`
`
`
`
`component that can perform one or more of several functions.” (’853, 16:26-27.) For
`example, “[t]he coordinator can direct distributed processing and aggregation.”
`(’853, 18:16-17.) Referring to figure 10, reproduced below, the ’853 patent states
`that “the coordinator can send uploaded source media to a splitter and a
`demuxer/demultiplexer”:
`
`
`
`(’853, Fig. 10, 16:60- 63; see also id. , 3:50- 51.) A “ splitter” can be invoked to
`“‘slice’ [media] assets into media segments and/or multiple sub assets comprising
`the entire stream.” (’853, 17:32-37.) A “demultiplexer” can “extract audio files
`IPR2025-00877
`Patent Owner 2013
`Page 5 of 70
`
`
`
`
`
`
`
`from the video and/or to extract still frame images from the video.” (’853, 1:45-47.)
`For example, “a demultiplexer component can strip audio data from segments and
`store audio streams for future analysis.” (’853, 17:61-62.)
`98. A person of ordinary skill in the art would recognize that the
`arrangement of a “coordinator ,” “splitter ,” and “plurality of demultiplexer
`nodes” recited in claim 1[a] is merely an application of the well-known concept of
`distributed processing (or multiprocessing) to video processing, to obtain the
`obvious benefits that processing time can be reduced, and more processing overall
`can be performed, by sharing the workload and performing multiple operations in a
`parallel fashion. (Houh, ¶98.) The ’853 pat ent likewise explains that “the splitter
`can be configured to segment the video,” then later states, for example, that “[t]he
`video-to-data engine can segment the video into chunks for distributed, or parallel,
`processing….Distributed processing in this context can mean that the processing
`time for analyzing a video from beginning to end is a fraction of the play time of the
`video. This can be accomplished by breaking the processes into sections and
`processing them simultaneously.” (’853, 2:35- 36, 6:15- 21.) By 2016,
`distributed/multi processing was well -known and neither novel nor non- obvious.
`(Houh, ¶98 (citing EX1012EX1012 (Microsoft Computer Dictionary 5th ed. 2002),
`pp.005 (entry for “distributed processing” explaining that “distributed processing
`shares the workload among computers that can communicate with one another” ),
`IPR2025-00877
`Patent Owner 2013
`Page 6 of 70
`
`
`
`
`
`
`
`006 (entry for “multiprocessing” ) explaining “objective is increased speed or
`computing power”).) And as explained below, claim 1 [a] is obvious over Zhao in
`view of Kritt.
`99. Turning to Zhao, PetitionerI will first explain how Zhao teaches both
`“splitter” operations and “demultiplexer” operations. PetitionerI will then explain
`how Zhao discloses and renders obvious a “ coordinator” before turning to the
`combination with Kritt.
`100. Zhao itself discloses a “splitter ” “configured to segment the video ”
`in the form of processes that perform “shot/scene boundary detection” to split a video
`into shots or scenes that are extracted for processing. (Houh, ¶100.) Zhao explains:
`Once the video has been retrieved, the video undergoes a process of
`detection of objects and extraction of object features from video images
`(processes 210, 220). Concurrently, shots or scenes are detected in the
`video via a shot/scene boundary detection procedure (process 215). The
`shots or scenes of the video are extracted to provide a sequence of
`images for analysis by the system, and each detected shot or scene is
`individually analyzed. Generally, the term “shot” or “scene” refers to a
`grouping of frames or images recorded by either a stationary or
`smoothly-moving camera, with little or no background change between
`the frames, corresponding to one continuous time period.
`(Zhao, ¶0057.) Figure 2 of Zhao illustrates this process with shot boundary detection
`process 215:
`IPR2025-00877
`Patent Owner 2013
`Page 7 of 70
`
`
`
`
`
`
`
`
`(Zhao, Fig. 2 (highlighted added).)
`101. With respect to the “demultiplexer,” Zhao also discloses at least some
`of the claimed “demultiplexer ” functionality—i.e., “ extract[ing] still frame
`images from the video .” (Houh, ¶101.) Zhao describes “receiving a video file,
`wherein the video file comprises a plurality of video frames.” (Zhao, ¶0023.) Zhao
`teaches that the object detection and recognition processes involve extracting still
`frame images from the video. (Houh, ¶101.) For example, Zhao explains that the
`object recognition process is performed on an “image or frame extracted from a
`IPR2025-00877
`Patent Owner 2013
`Page 8 of 70
`
`
`
`
`
`
`
`video.”2 (Zhao, ¶0075 (“The process 400 shown in FIG. 4 is for one image or frame
`extracted from a video, but as will be appreciated, this process is typically repeated
`for each optimal object image in the video.”).) Zhao similarly explains that the object
`detection process involves “detection of objects and extraction of object features
`from video images (processes 210, 220).” (Zhao, ¶0057; see also id., ¶0059 (“Once
`an object has been detected within a video frame…”).)
`102. With respect to “demultiplexer” functionality to “extract audio files
`from the video,” Petitioner notesI note that the claims of the ’853 patent do not
`recite any additional limitations related to “audio” or the extracted “audio files.”
`(Houh, ¶102.) Zhao likewise does not discuss audio information, which is not
`surprising because audio processing is not a focus of Zhao’s purported invention.
`(Id.) But it would have been obvious for Zhao’s system to further include processes
`for “extracting audio files from the video.” (Id.) For example, Zhao discloses
`performing its techniques on videos such as episodes of television shows such as the
`Gilmore Girls (Zhao, e.g., ¶¶0040-0041, 0093, Fig. 6), which a person of ordinary
`skill in the art would have understood and found obvious to have included one or
`more audio tracks. (Houh, ¶102.) Zhao more generally teaches that its techniques
`are “useful for a wide variety of applications, including video indexing and retrieval,
`
`
`2 All underlining added to quoted material unless noted.
`IPR2025-00877
`Patent Owner 2013
`Page 9 of 70
`
`
`
`
`
`
`
`video surveillance and security, unknown person identification, advertising, and
`many other fields.” (Zhao, ¶0053.) Given Zhao’s broad applicability, a person of
`ordinary skill in the art would have readily appreciated and found it obvious that
`videos processed by Zhao’s system would include audio tracks. (Houh, ¶102.) A
`person of ordinary skill in the art would have found it obvious to extract audio tracks
`from the video into separate audio files —for example, to separately store or
`analyze—but again, those sorts of preparatory operations are not a focus of Zhao.
`(Id.) For example, a person of ordinary skill in the art would have understood and
`found it obvious to perform operations such as speech recognition or speaker
`recognition on the audio data. (Id. ) In a security and surveillance context (and
`numerous other contexts), for instance, these operations could be used to further
`confirm a person’s identity or to make a record of conversations associated with the
`captured video. (Id. ) For example, Kritt describes using audio to “validate that
`the human object was correctly identified” in a video. ( Id. (quoting Kritt, ¶0033).)
`(“[A] visual tag may indicate that a particular human object appears in a shot and an
`audio tag identifies an audio signature of the human object is associated with the
`shot. In these examples, if the tags that are compared indicate the same object, the
`positive or consistent result of the comparison may be used…to validate that the
`human object was correctly identified.”).)
`103. A step of extracting audio tracks from a video file in Zhao would have
`IPR2025-00877
`Patent Owner 2013
`Page 10 of 70
`
`
`
`
`
`
`
`resulted in “ extract[ing] audio files” in at least two separate ways. (Houh, ¶103.)
`First, it was well-knownwell known to a person of ordinary skill in the art that video
`file formats took the form of “container” files (or “wrapper” files) that in turn stored
`video tracks, audio tracks, and other data as files within that container —much like
`how a single “.zip” archive file can store multiple files. (Id. ) For example, Kritt
`explains in its Background section:
`A television show, movie, internet video, or other similar content may
`be stored on a disc or in other memory using a container or wrapper file
`format. The container format may be used to specify how multiple
`different data files are to be used. The container format for a video may
`identify different data types and describe how they are to be interleaved
`when the video is played. A container may contain video files, audio
`files, subtitle files, chapter-information files, metadata, and other files.
`A container also typically includes a file that specifies synchronization
`information needed for simultaneous playback of the various files.
`(Kritt, ¶0002.) Similarly, the 2010 book , Pro HTML5 Programming, by Peter
`Lubbers et al., explains that “[a]n audio or video file is really just a container file,
`similar to a ZIP archive file that contains a number of files.” (EX1011, p.0013.)
`Figure 3-1 from the book, reproduced below, “shows how a video file (a video
`container) contains audio tracks, video tracks, and additional metadata. The audio
`and video tracks are combined at runtime to play the video.” (EX1011, p.0013.)
`IPR2025-00877
`Patent Owner 2013
`Page 11 of 70
`
`
`
`
`
`
`
`
`(EX1011, p.0014.) Therefore, a person of ordinary skill in the art would have
`understood that, by extracting audio tracks from a video file, a step of “extract[ing]
`audio files from the video” would be performed. (Houh, ¶103.)
`104. Second, by extracting audio from the video for separate analysis or
`storage, a person of ordinary skill in the art would have understood and found it
`obvious that the resulting extracted audio would take the form of “files.” (Houh,
`¶104.) This is because, for starters, file storage was one of a finite number of ways
`of storing data. (Id.) Moreover, it was routine to use files—known as “scratch files”
`or “temporary files”—for at least temporary storage while the associated data was
`IPR2025-00877
`Patent Owner 2013
`Page 12 of 70
`
`
`
`
`
`
`
`processed. (Id. (citing EX1012See, e.g., EX1012 (Microsoft Computer Dictionary
`(5th ed. 2002)) , pp.007 (entries for “scratch” and “scratch file”), 008 (entry for
`“temporary file”)).)
`105. Turning back to the claimed “coordinator,” a person of ordinary skill
`in the art would have further understood that, in order to manage the various video
`file retrieval, splitter, and demultiplexer operations, Zhao’s system would also
`include “coordinator” functionality. (Houh, ¶105.) For example, referring to figure
`2, Zhao
` shows and explains how splitter operations (e.g., process 215) and
`demultiplexer operations (e.g., processes 210, 400) take place concurrently:
`Once the video has been retrieved, the video undergoes a process of
`detection of objects and extraction of object features from video images
`(processes 210, 220). Concurrently, shots or scenes are detected in the
`video via a shot/scene boundary detection procedure (process 215).
`IPR2025-00877
`Patent Owner 2013
`Page 13 of 70
`
`
`
`
`
`
`
`
`
`(Zhao, Fig. 2 (highlightingannotations added), ¶0057.) A person of ordinary skill in
`the art would have understood that various operations would need to be performed
`in order to retrieve a video file and then distribute the file for concurrent processing
`by processes 215 and 210, for example. (Houh, ¶105.) Those operations, which I
`have highlighted in figure 2 of Zhao below, would form at least part of a
`“coordinator.”
`
`IPR2025-00877
`Patent Owner 2013
`Page 14 of 70
`
`
`
`
`
`
`
`
`(Zhao, Fig. 2 (highlightingannotations added); Houh, ¶105.) A person of ordinary
`skill in the art would have further understood that, for Zhao’s “coordinator”
`functionality to distribute the video file that it retrieved, it would need to be
`“communicatively coupled” to the system’s “splitter” and “demultiplexer”
`functionality in order to notify those components in some manner. (Houh, ¶105.) In
`fact, Zhao’s disclosure of “concurrent” processing (Zhao, ¶0057) suggests that the
`video file is actually distributed to separate nodes (such as separate processors or
`computer systems) with which Zhao’s “coordinator” would be communicatively
`coupled. (Houh, ¶105.) This is just like “coordinator” functionality described in the
`IPR2025-00877
`Patent Owner 2013
`Page 15 of 70
`
`
`
`
`
`
`
`’853 patent, which describes receiving a media file and distributing it to a splitter and
`demultiplexer. (Id. (citing ’853, 16:60- 63 (“the coordinator can send uploaded
`source media to a splitter and a demuxer/demultiplexer”)).)
`106. Lastly, before turning to the combination with Kritt, with respect to the
`claimed “ plurality of demultiplexer nodes,” this too is suggested by Zhao.
`(Houh, ¶106.) Although Zhao focuses on describing its system with respect to a
`single video file, Zhao explains that “the process may be extrapolated to analyze a
`plurality of videos.” (Zhao, ¶0057.) Because as noted above, Zhao also describes
`“concurrent” processing, a person of ordinary skill in the art would have therefore
`understood and found it obvious for Zhao’s system, in an implementation for
`processing a higher volume of video files, to comprise a “ plurality” of nodes
`providing the claimed demultiplexer functionality. (Houh, ¶106.) As discussed
`above, the benefits of distributed computing were well known and therefore would
`have been obvious to try. (Id E. (citingg., EX1012 (Microsoft Computer Dictionary
`5th ed. 2002) , pp.005 (entry for “distributed processing”), 006 (entry for
`“multiprocessing”)).)
`107. Turning to the combination with Kritt, like Zhao, Kritt describes a
`system for processing both visual and audio data of videos to generate data about the
`videos. (Houh, ¶107 (citing Kritt, ¶¶0017 (“a viewer selects a video, and human,
`nonhuman, and audio objects of the video are identified”For example:
`IPR2025-00877
`Patent Owner 2013
`Page 16 of 70
`
`
`
`
`
`
`
`According to various embodiments, a viewer selects a video, and
`human, nonhuman, and audio objects of the video are identified. In
`addition, key words that are spoken by human objects in the video are
`identified. Human, nonhuman, and audio objects may be used to
`classify a particular segment of a video as a scene. The objects and key
`words are then associated with the scenes of the video.
`), (Kritt, ¶0017.) As Kritt notes in the passage above, both visual and audio data are
`processed. ( See also Kritt, ¶¶ 0043 (“visual objects in a shot may be identified by
`application of one or more known image recognition processes to the shot”), 0059
`(“Segments of the video for which the audio feature values are similar to those that
`are characteristic of speech may be classified as speech.”), 0065 (“In operation 612,
`an audio transcript may be determined. An audio transcript may include all of the
`words spoken in the video….[S]poken words may be determined from audio features
`classified as speech using any known technique.”)).)
`108. With respect to claim 1[a] specifically, and concepts of
`distributed/multi processing, Kritt describes modules for processing video and audio
`of a video file, and that those modules can be distributed on different computer
`systems:
`The memory 104 may store all or a portion of the following: an audio
`visual file container 150 (shown in FIG. 2 as container 202), a video
`processing module 152, an audio processing module 154, and a control
`module 156. These modules are illustrated as being included within the
`IPR2025-00877
`Patent Owner 2013
`Page 17 of 70
`
`
`
`
`
`
`
`memory 104 in the computer system 100, however, in other
`embodiments, some or all of them may be on different computer
`systems and may be accessed remotely, e.g., via a network.
`(Kritt, ¶0022; see also id., ¶0024 (“The video processing module 152 may include
`various processes that generate visual tags according to one embodiment. The audio
`processing module 154 may include various processes for generating audio and key
`word tags according to one embodiment.”); Houh, ¶108.)
`109. Kritt further describes how its systems could be provided using “cloud-
`computing infrastructure.” (Kritt, ¶¶0078-0079.) Kritt explains, as was known to a
`person of ordinary skill in the art , that “[c]loud computing generally refers to the
`provision of scalable computing resources as a service over a network.” (Kritt,
`¶0078; Houh, ¶109.) “More formally,” Kritt explains, “cloud computing may be
`defined as a computing capability that provides an abstraction between the
`computing resource and its underlying technical architecture (e.g., servers, storage,
`networks), enabling convenient, on-demand network access to a shared pool of
`configurable computing resources that can be rapidly provisioned and released with
`minimal management effort or service provider interaction.” (Kritt, ¶0078.) Kritt
`further explains that one benefit of cloud computing is that “[t]ypically, cloud-
`computing resources are provided to a user on a pay-per-use basis, where users
`are charged only for the computing resources actually used (e.g., an amount of
`storage space used by a user or a number of virtualized systems instantiated by the
`IPR2025-00877
`Patent Owner 2013
`Page 18 of 70
`
`
`
`
`
`
`
`user).” (Kritt, ¶0079.) It also allows a user to “access any of the resources that reside
`in the cloud at any time, and from anywhere across the Internet.” (Id Kritt, ¶0079.)
`As discussed above for Zhao, a person of ordinary skill in the art would have
`recognized that in such a distributed/multi processing system, “ coordinator”
`functionality would be needed to manage the operations and workloads among the
`various video and audio processing nodes. (Houh, ¶109.)
`110. Rationale and Motivation to Combine (Zhao with Kritt). It would
`have been obvious to a person of ordinary skill in the art to incorporate Kritt’s
`teachings regarding using a distributed/multi processing system (such as a pool of
`nodes in a cloud -computing system) for processing images and audio into Zhao’s
`video processing system. (Houh, ¶110.) The combination would have involved the
`conventional application of distributed processing techniques to Zhao’s system and
`would have predictably resulted in Zhao’s system in which multiple nodes were used
`for processing images and audio of a retrieved video file, along with functionality
`that managed that processing. (Id.) The combination therefore would have resulted
`in “a coordinator”—i.e., functionality for managing the overall distributed process
`of analyzing images and audio of a video file—“communicatively coupled to
`a splitter”—i.e., functionality to split a video into shots/segments —“and to a
`plurality of demultiplexer nodes ”—e.g., multiple nodes in a cloud computing
`infrastructure for performing image and audio recognition. (Id.) The combination
`IPR2025-00877
`Patent Owner 2013
`Page 19 of 70
`
`
`
`
`
`
`
`therefore would have resulted in a “splitter [] configured to segment the video,”
`and “demultiplexer nodes [] configured to extract audio files from the video and
`to extract still frame images from the video.” (Id. )
`111. Zhao and Kritt are analogous references to the ’853 patent, being. Like
`the ’853 patent, both Zhao and Kritt are in the same field of video processing and
`generating data from video content, and also. (’853, 1:34-36 (“The present invention
`is generally directed to a method to generate data from video content, such as text
`and/or image-related information.”); e.g., Zhao, ¶0036, Fig. 2; e.g., Kritt, ¶0017.)
`Zhao and Kritt would have also been reasonably pertinent to problems facing the ’853
`inventors. (Houh, ¶111 (citing ’853, 1:34-36; Zhao, ¶0036, Fig. 2; Kritt, ¶0017).) ,
`including techniques for recognizing objects in video.
`112. A person of ordinary skill in the art would have been motivated to
`combine. (Houh, ¶112.) As noted above, although Zhao focuses on describing its
`system with respect to a single video file, Zhao explains that “the process may be
`extrapolated to analyze a plurality of videos.” (Zhao, ¶0057.) Zhao also teaches that
`its techniques are “useful for a wide variety of applications, including video indexing
`and retrieval, video surveillance and security, unknown person identification,
`advertising, and many other fields.” (Zhao, ¶0053.) A person of ordinary skill would
`have therefore recognized that an application of Zhao’s system could involve high-
`volume video processing, along with potential time constraints —for example, a
`IPR2025-00877
`Patent Owner 2013
`Page 20 of 70
`
`
`
`
`
`
`
`large-scale video indexing and retrieval website, or a commercial video surveillance
`and security service. (Houh, ¶112.) A person of ordinary skill in the art would have
`therefore looked to an analogous reference such as Kritt in the same field as Zhao
`that provides further teachings regarding system architectures capable of providing
`such a larger-scale service. (Id.) Kritt also provides express motivation to combine,
`describing numerous advantages of a cloud-based system, including “convenient,
`on-demand network access to a shared pool of configurable computing resources
`that can be rapidly provisioned and released wit h minimal management effort or
`service provider interaction,” “resources [] provided to a user on a pay-per -use
`basis,” and “access any of the resources that reside in the cloud at any time, and
`from anywhere across the Internet.” (Kritt, ¶¶0078-0079; Houh, ¶112.) Moreover,
`with respect to audio processing, as explained above, Kritt also explains integrating
`audio and image processing would beneficially enable confirmation of recognized
`visual objects in the video, as well as a transcript that could be used for further
`analysis of the video. (Kritt, e.g., ¶¶0033, 0065; Houh, ¶112 (using audio to “validate
`that the human object was correctly identified” in a video), 0065 (audio transcript).)
`113. A person of ordinary skill in the art would have had a reasonable
`expectation that the combination would have been successful. (Houh, ¶113.) It
`would have involved the straightforward application of conventional techniques for
`distributed processing (and audio processing). (Id.) As Kritt notes, cloud computing
`IPR2025-00877
`Patent Owner 2013
`Page 21 of 70
`
`
`
`
`
`
`
`for example was already well-established, as were techniques for speech and speaker
`recognition.
`(b). “an image detector configured to detect an image of an
`object in the still frame images, wherein the image
`detector is adjustable to increase detection of non -
`primary images in the video; and” (Claim 1[b])
`114. As explained below, claim 1[b] is obvious over Zhao in view of
`Steinberg. (Houh, ¶¶114-124.)
`115. The ’853 patent describes image “detection” and “recognition” as a
`two-step process, first “detection,” then “recognition.”:
`FIG. 8 illustrates a flow diagram of an embodiment of image
`recognition. Images can be classified as faces and/or objects. Image
`recognition can include two components: image detection 800 and
`image recognition 810. Image detection can be utilized to determine if
`there is a pattern or patterns in an image that meet the criteria of a face,
`image, or text. If the result is positive, the detection processing then
`moves to recognition, i.e. image matching.
`(’853, 13:39-46.) Zhao similarly discloses separate detection and recognition steps.
`(Houh, ¶¶115-116.)
`116. With respect to image detection, Zhao discloses “an image detector
`configured to detect an image of an object in the still frame images ”—such as
`processes to detect images of objects such as faces in extracted still frame images of
`a video. (Houh, ¶116.) Referring to Figure 2, reproduced below, Zhao explains that
`IPR2025-00877
`Patent Owner 2013
`Page 22 of 70
`
`
`
`
`
`
`
`“[o]nce the video has been retrieved, the video undergoes a process of detection of
`objects and extraction of object features from video images (processes 210, 220)”
`(Zhao, ¶0057):
`
`(Zhao, Fig. 2 (highlighting added).) Zhao describes in detail image detection process
` 210 to detect an object image from video images. (:
`At process 210 , object images are detected from video scenes using
`wavelet features of the images in combination with an Adaboost
`classifier…. As described herein a “feature” or “local feature” generally
`IPR2025-00877
`Patent Owner 2013
`Page 23 of 70
`
`
`
`
`
`
`
`describes an element of significance within an image that enables
`recognition of an object within the image. A feature typically describes
`a specific structure in an image, ranging from simple structures such as
`points, edges, corners, and blobs, to more complex structures such as
`entire objects. For example, for facial recognition, features include
`eyes, noses, mouths, ears, chins, etc., of a face in an image, as well as
`the corners, curves, color, overall shape, etc., associated with each
`feature. Features are d escribed in greater detail below and throughout
`this document. Generally, an AdaBoost (short for “Adaptive Boosting”)
`classifier refers to a machine-learning algorithm, and may be used in
`conjunction with other algorithms to improve overall performance. The
`AdaBoost classifier assists the system in learning to identify and detect
`certain types of objects, as described in greater detail below.
`(Zhao, ¶0058.)
`117. Claim 1[b] further recites “wherein the image detector is adjustable
`to increase detection of non-primary images in the video.” Zhao does not disclose
`image detection functionality that is “ adjustable to increase detection of non -
`primary images in the video,” but this would have been obvious in view of
`Steinberg. (Houh, ¶¶117-124.)
`118. Zhao does describe techniques for a system operator to adjust settings
`based on desired accuracy and/or precision of recognition and/or object tracking,
`and also acknowledges the issue of false facial recognitions produced by extras or
`crowds in a movie. (Houh, ¶118.) For example, in connection with an object
`IPR2025-00877
`Patent Owner 2013
`Page 24 of 70
`
`
`
`
`
`
`
`tracking/recognition process, Zhao explains how a system operator can adjust a
`similarity threshold. (Zhao, ¶0089 (“[T]he threshold value may be raised or lowered
`by a system operator depending on whether the operator would rather have higher
`accuracy, or more identifications.”):
`At step 525, each average confidence score or similarity measure is
`compared to a predetermined threshold. In one embodiment, the
`predetermined threshold is set by a system operator based on a desired
`accuracy and/or precision of recognition and/or tracking. For example,
`a higher threshold yields more accurate results because a similarity
`measure must exceed that threshold to be identified as matching the
`given model. Thus, if a detected object group and an identified object
`model have a high similarity value, the re is therefore a higher
`percentage chance that the object group was correctly recognized.
`However, a higher threshold value also produces a lower recall, as some
`images or groups that were in fact correctly recognized are discarded
`or ignored because they do not meet a high similarity threshold (i.e. the
`group was correctly recognized, but had too many differences from the
`model based on its images’ poses, resolutions, occlusions, etc.).
`Alternatively, a low threshold leads to higher recognition of images as
`matching a given model, but it is also likely to produce a higher
`percentage of false identifications. As will be understood, the threshold
`value may be raised or lowered by a system operator depending on
`whether the operator would rather have higher accuracy, or more
`identifications.
`(Zhao, ¶0089.) Zhao also teaches that whether a high or low similarity threshold is
`IPR2025-00877
`Patent Owner 2013
`Page 25 of 70
`
`
`
`
`
`
`
`desirable depends on the video content —for example, “in a facial recognition
`embodiment, extras or crowds in a movie often produce false recognitions.” (Zhao,
`¶0091; Houh, ¶118.):
`In addition to the purposes and advantages of object tracking described
`above, a further benefit is that false positives of object recognitions are
`reduced because they likely do not fit into clearly established groups in
`a given video. For example, in a facial recognition embodiment, extras
`or crowds in a movie often produce false recognitions. However,
`because these recognitions are likely to be random and cannot be
`tracked over time (based on inconsistent or infrequent occurrences in a
`video), these false recognitions can be discarded as not belonging to a
`distinct group, and thus can be assumed to be false positives.
`(Zhao, ¶0091.)
`119. Like Zhao, Steinberg also relates to facial detection in digital images.
`(Houh, ¶119 (citing Steinberg notes that “[t]he problem of face detection has not
`received a great deal of attention from researchers. Most conventional techniques
`concentrate on face recognition, assuming that a reg

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket