`U.S. Patent No. 11,126,853 B2
`
`
`
`
`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`
`
`GOOGLE LLC,
`Petitioner,
`
`v.
`
`CELLULAR SOUTH, INC.,
`Patent Owner.
`
`
`Case IPR2025-00877
`Patent 11,126,853 B2
`Issue Date: September 21, 2021
`
`Title: VIDEO TO DATA
`
`
`PETITION FOR INTER PARTES REVIEW
`OF U.S. PATENT NO. 11,126,853 B2
`
`
`
`
`
`
`Table of Contents
`
`
`Page
`
`
`
`I.
`
`MANDATORY NOTICES UNDER §42.8(A)(1) ........................................ 1
`A.
`Real Party-In-Interest under §42.8.(b)(1) ........................................... 1
`B.
`Related Matters under §42.8(b)(2) ..................................................... 1
`C.
`Lead and Back-Up Counsel under §42.8(b)(3) ................................... 1
`D.
`Service Information ........................................................................... 2
`FEE PAYMENT .......................................................................................... 2
`II.
`III. REQUIREMENTS UNDER §§ 42.104 AND 42.108 AND
`CONSIDERATIONS UNDER §§ 314(A) AND 325(D) .............................. 2
`A. Grounds for Standing ......................................................................... 2
`B.
`Identification of Challenge and Statement of Precise Relief
`Requested .......................................................................................... 3
`Considerations Under §§ 314(a) and 325(d) ....................................... 3
`C.
`IV. OVERVIEW OF THE PATENT ................................................................. 5
`A.
`Level of Ordinary Skill ...................................................................... 5
`B.
`Specification Overview ...................................................................... 5
`CLAIM CONSTRUCTION ......................................................................... 9
`V.
`VI. THE CHALLENGED CLAIMS ARE UNPATENTABLE .......................... 9
`A. Overview of Grounds ......................................................................... 9
`B.
`Prior Art Status of Relied-Upon References ..................................... 11
`C.
`Ground 1: Claims 1-4 Are Obvious Over Zhao in View of
`Kritt, Steinberg, and Kouzani ........................................................... 11
`Independent Claim 1: “A system for generating data
`
`from a video, comprising:” (Claim 1[pre]) ............................ 11
`
`
`
`
`
`-i-
`
`
`
`
`
`Table of Contents
`(continued)
`
`Page
`
`-ii-
`
`
`
`(a)
`
`(b)
`
`(c)
`
`“a coordinator communicatively coupled to a
`splitter and to a plurality of demultiplexer nodes,
`wherein the splitter is configured to segment the
`video, wherein the demultiplexer nodes are
`configured to extract audio files from the video
`and to extract still frame images from the video;”
`(Claim 1[a]) ................................................................. 13
`“an image detector configured to detect an image
`of an object in the still frame images, wherein the
`image detector is adjustable to increase detection
`of non-primary images in the video; and” (Claim
`1[b]) ............................................................................. 30
`“an object recognizer configured to compare the
`image of the object to a fractal, wherein the fractal
`includes a representation of the object based on
`landmarks associated with the object, wherein the
`recognizer is further configured to update the
`fractal with the image.” (Claim 1[c])........................... 36
`Claim 2: “The system of claim 1, wherein the
`coordinator is configured to increase the plurality of
`demultiplexer nodes when a threshold of processing
`capacity is reached.” .............................................................. 45
`Claim 3: “The system of claim 1, wherein the recognizer
`is configured to determine distinguishing geometric
`features of the object.” ........................................................... 47
`Claim 4: “The system of claim 3, wherein the
`distinguishing geometric features comprise a contour of
`eye sockets, a nose, and a chin.” ............................................ 48
`D. Ground 2: Claims 5-7 Are Obvious Over Ground 1 Prior Art in
`Further View of Yang and Romdhani .............................................. 49
`Claim 5: “The system of claim 4, wherein the recognizer
`
`is configured to determine skin textures of the object.” .......... 49
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Table of Contents
`(continued)
`
`Page
`
`
`
`
`
`
`
`
`
`
`
`E.
`
`F.
`
`Claim 6: “The system of claim 5, wherein the
`distinguishing geometric features or the skin textures are
`determined based on a three-dimensional model.” ................. 55
`Claim 7: “The system of claim 6, wherein the three-
`dimensional model is a three-dimensional morphable
`model.” .................................................................................. 56
`Ground 3: Claims 8-9 Are Obvious Over Ground 1 Prior Art in
`Further View of Trivedi ................................................................... 57
`Claim 8: “The system of claim 1, further comprising a
`
`processor and a camera for capturing the video.” ................... 57
`Claim 9: “The system of claim 8, wherein the processor
`is configured to transform the video into a rectilinear
`format.” ................................................................................. 61
`Ground 4: Claims 10-11 Are Obvious Over Ground 1 Prior Art
`in Further View of Singer ................................................................ 64
`Claim 10: “The system of claim 1, wherein the
`
`coordinator is configured to embed metadata about the
`object into the video, and wherein the metadata
`comprises a timestamp and a coordinate location of the
`object in the still frame images.” ............................................ 64
`Claim 11: “The system of claim 1, wherein the
`coordinator is configured to generate a metadata stream
`corresponding to the image, wherein the metadata stream
`includes one or more timestamps corresponding to the
`image, and wherein the coordinator is configured to
`embed the metadata stream in the video.” .............................. 71
`VII. CONCLUSION ......................................................................................... 73
`CERTIFICATE OF SERVICE ............................................................................. 75
`
`
`
`
`
`
`-iii-
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`List of Exhibits
`
`
`
`
`
`
`
`
`
`Exhibit
`Description of Document
`No.
`1001 U.S. Patent No. 11,126,853 B2 to Bartlett Wade Smith, IV (filed
`February 8, 2019, issued September 21, 2021) (“’853” or “’853
`patent”)
`1002 Declaration of Henry Houh, Ph.D. (“Houh”)
`1003 U.S. Patent App. Pub. No. 2009/0141940 A1 to Liang Zhao et al. (filed
`December 3, 2008, published June 4, 2009) (“Zhao”)
`1004 U.S. Patent App. Pub. No. 2014/0181668 A1 to Barry A. Kritt et al.
`(filed Dec. 20, 2012, published Jun. 26, 2014) (“Kritt”)
`1005 U.S. Patent No. 7,574,016 B2 to Eran Steinberg et al. (filed June 26,
`2003, issued August 11, 2009) (“Steinberg”)
`1006 Kouzani, A.Z. et al., “Fractal Face Representation and Recognition,” in
`Proceedings of the IEEE International Conference on Systems, Man
`and Cybernetics, pp.1609-1613 (1997) (“Kouzani”)
`1007 Yang, Ming-Hsuan et al., “Detecting Faces in Images: A Survey,” in
`IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol.
`24, No. 1 (2002) (“Yang”)
`1008 Romdhani, S. et al., “Face Identification by Fitting a 3D Morphable
`Model using Linear Shape and Texture Error Functions,” European
`Conf. on Computer Vision (2002) (“Romdhani”)
`1009 U.S. Patent App. Pub. No. 2006/0187305 A1 to Mohan M. Trivedi et
`al. (PCT filed July 1, 2003, published August 24, 2006) (“Trivedi”)
`1010 U.S. Patent App. Pub. No. 2011/0305394 A1 to David William Singer
`et al. (filed June 15, 2010, published December 15, 2011) (“Singer”)
`1011 Excerpts from Peter Lubbers et al., Pro HTML5 Programming:
`Powerful APIs for Richer Internet Application Development (2010)
`1012 Excerpts from Microsoft Computer Dictionary (5th ed. 2002)
`1013 Excerpts from Jura van Vliet et al., Programming Amazon EC2 (2011)
`
`-i-
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`List of Exhibits
`
`
`
`
`
`
`
`Exhibit
`Description of Document
`No.
`1014 Xiutao Tang et al., “Facial Image Recognition Based on Fractal Image
`Encoding,” in Bell Labs Technical Journal 15(1) (2010)
`1015 Prosecution History of U.S. Patent No. 11,126,853 B2
`1016 U.S. Patent No. 9,697,230 to Henry Houh et al. (filed Mar. 31, 2006,
`issued Jul. 4, 2017)
`1017 Cellular South’s Opposition to Motion to Dismiss (Document 28) in
`No. 6:24-cv-00245-DAE (W.D. Tex.)
`1018 Declaration of Ingrid Hsieh-Yee, Ph.D. (“Hsieh-Yee”)
`1019 Proof of Service of Complaint
`
`-ii-
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`I. MANDATORY NOTICES UNDER §42.8(A)(1)
`A. Real Party-In-Interest under §42.8.(b)(1)
`Google LLC (“Petitioner”) is the real party-in-interest to this IPR petition.1
`
`B. Related Matters under §42.8(b)(2)
`The ’853 patent is the subject of pending litigation involving Petitioner:
`
`Cellular South, Inc. v. Google LLC, Case No. 4:25-cv-01487-YGR (N.D. Cal.).
`
`Petitioner was served on May 10, 2024. (EX1019, p.001.) The case was originally
`
`filed in the Western District of Texas and was transferred to the Northern District of
`
`California on February 12, 2025.
`
`C. Lead and Back-Up Counsel under §42.8(b)(3)
`Petitioner provides the following designation of counsel.
`
`LEAD COUNSEL
`
`BACK-UP COUNSEL
`
`Heidi L. Keefe (Reg. No. 40,673)
`hkeefe@cooley.com
`
`COOLEY LLP
`ATTN: Patent Group
`1299 Pennsylvania Ave. NW, Suite 700
`Washington, DC 20004
`Tel: (650) 843-5001
`Fax: (650) 849-7400
`
`Andrew C. Mace (Reg. No. 63,342)
`amace@cooley.com
`
`Mark R. Weinstein (Admission pro hac
`vice to be requested)
`mweinstein@cooley.com
`
`
`
`1 Google LLC is a subsidiary of XXVI Holdings Inc., which is a subsidiary of
`
`Alphabet Inc. XXVI Holdings Inc. and Alphabet Inc. are not real parties in interest
`
`to this proceeding.
`
`
`
`
`
`-1-
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`
`LEAD COUNSEL
`
`BACK-UP COUNSEL
`
`Reuben Chen (Admission pro hac vice to
`be requested)
`rchen@cooley.com
`
`Alexandra D. Leeper (Admission pro hac
`vice to be requested)
`aleeper@cooley.com
`
`COOLEY LLP
`ATTN: Patent Group
`1299 Pennsylvania Ave. NW, Suite 700
`Washington D.C. 20004
`
`Service Information
`D.
`This Petition is being served by Federal Express to the attorney of record for
`
`
`
`the ’853 patent, 27890 - STEPTOE LLP/DC, 1330 CONNECTICUT AVENUE,
`
`N.W., WASHINGTON, DC 20036. Petitioner consents to electronic service at the
`
`addresses provided above for lead and back-up counsel.
`
`II.
`
`FEE PAYMENT
`Petitioner requests review of 11 claims, with a $51,875 payment.
`
`42.104 AND
`§§
`III. REQUIREMENTS UNDER
`CONSIDERATIONS UNDER §§ 314(A) AND 325(D)
`A. Grounds for Standing
`Petitioner certifies that the ’853 patent is available for IPR and that Petitioner
`
`42.108 AND
`
`is not barred or otherwise estopped.
`
`
`
`
`
`-2-
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`Identification of Challenge and Statement of Precise Relief
`B.
`Requested
`Petitioner requests IPR institution based on:
`
`Basis for Challenge under §103
`Ground Claims
`Zhao in view of Kritt, Steinberg, and Kouzani
`1
`1-4
`5-7 Ground 1 prior art in further view of Yang and Romdhani
`2
`8-9 Ground 1 prior art in further view of Trivedi
`3
`10-11 Ground 1 prior art in further view of Singer
`4
`
`Submitted with this Petition is the Declaration of Henry Houh, Ph.D.
`
`(EX1002) (“Houh”), a qualified technical expert. (Houh, ¶¶1-15, App’x A.)
`
`C. Considerations Under §§ 314(a) and 325(d)
`Petitioner respectfully submits that there is no section 314(a) or 325(d) issue
`
`that would warrant discretionary denial of the Petition.
`
`Petitioner hereby stipulates that if this IPR is instituted, then Petitioner will
`
`not pursue in the related pending litigation the specific grounds of invalidity that
`
`were raised or that reasonably could have been raised under 35 U.S.C. §§ 102 or 103
`
`on the basis of prior art patents or printed publications in this IPR. To avoid any
`
`doubt, if the PTAB declines institution or rescinds institution of IPR, then Petitioner
`
`reserves the right to pursue any grounds of invalidity, including but not limited to
`
`the grounds raised or that reasonably could have been raised in this IPR, in the
`
`related pending litigation.
`
`
`
`
`
`-3-
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`§314(a): The General Plastic factors are not relevant; this is the first and only
`
`IPR petition filed by Petitioner with respect to the ’853 patent.
`
`Nor do the Fintiv factors support discretionary denial under §314(a). As noted
`
`above, the pending litigation involving Petitioner was recently transferred to the
`
`Northern District of California. Before transfer, the pending litigation was still in
`
`an early pre-Answer stage, with no substantive discovery or claim construction
`
`having taken place. Since being transferred to the Northern District of California,
`
`the litigation has not substantively progressed. An initial case management
`
`conference is currently set for June 20, 2025. There is no schedule or trial date.
`
`Petitioner also intends to move to stay the litigation pending resolution of IPR.
`
`§325(d): Advanced Bionics does not apply to any of the references relied
`
`upon in Ground 1 or Ground 2 because none were presented during prosecution.
`
`With respect to Trivedi and Singer, cited in Ground 3 and Ground 4,
`
`respectively, Petitioner cites them for the same claim limitations in dependent claims
`
`8-11 for which the Examiner cited them during prosecution, and which was not
`
`distinguished or disputed by the applicant. (See EX1015, pp.00145 (“Trivedi further
`
`teaches further comprising a processor and a camera for capturing the video”;
`
`“Trivedi further teaches wherein the processor is configured to transform the video
`
`into rectilinear format”), 00147-00148 (“Singer…teaches wherein the coordinator is
`
`configured to embed metadata about the object into the video, and wherein the
`
`
`
`
`
`-4-
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`metadata comprises a timestamp and a coordinate location of the object in the still
`
`frame images”; “Singer further teaches wherein the coordinator is configured to
`
`generate a metadata stream corresponding to the image, wherein the metadata stream
`
`includes one or more timestamps corresponding to the image, and wherein the
`
`coordinator is configured to embed the metadata stream in the video”), 00167-00169
`
`(Applicant Reply to Office Action of October 29, 2020).)
`
`Petitioner reserves the right to address and respond to any assertions that
`
`Patent Owner may raise regarding discretionary factors.
`
`IV. OVERVIEW OF THE PATENT
`A. Level of Ordinary Skill
`A person of ordinary skill would have possessed a bachelor’s degree in
`
`electrical engineering, computer science, or similar field, with two years of
`
`experience in developing and implementing computer software for processing
`
`and/or analyzing multimedia content, such as audio, video, or image data. A person
`
`could also have qualified as a person of ordinary skill with some combination of (1)
`
`more formal education (such as a master’s of science degree) and less technical
`
`experience, or (2) less formal education and more technical or professional
`
`experience. (Houh, ¶¶18-23.)
`
`Specification Overview
`B.
`The ’853 patent describes its purpose at a very high level. It states that it
`
`
`
`
`
`-5-
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`“relates to a method and a system for generating various and useful data from source
`
`media, such as videos and other digital content.” (’853, 1:12-14.) And the summary
`
`of the invention confirms the broad brush the patent purports to take: “The present
`
`invention is generally directed to a method to generate data from video content, such
`
`as text and/or image-related information.” (’853, 1:34-36.)
`
`The ’853 patent purports to offer a solution to the common problem of how to
`
`describe visual information from audio-visual content. (Houh, ¶41.) It is in many
`
`ways similar to a museum curator providing text guides to an artist’s work, or a
`
`movie critic explaining the cinematography of a film. The curator or critic may
`
`distinguish images in the foreground from those in the background, and explain
`
`visual techniques such as object placement, shading, and texturing that make the
`
`artwork or scene visually effective. The curator or critic can provide commentary
`
`that tells the viewer what to look for, where to look, and (in a film) when to look.
`
`Audio-visual data such as videos long predates the ’853 patent, and prior
`
`artists similarly had long been working on various ways to analyze and describe
`
`video content. (Houh, ¶41.) By the time of the alleged priority date (June 29, 2016),
`
`the fields of audio and image analysis for video content were well-developed.
`
`Indeed, the Background section of the ’853 patent acknowledges known image
`
`searching techniques to identify matching and similar images, as well as audio-to-
`
`text algorithms for transcribing text from audio. (’853, 1:21-29.)
`
`
`
`
`
`-6-
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`The specification of the ’853 patent provides primarily a series of high-level,
`
`functional explanations for how to implement the alleged invention, for which it
`
`relies on well-known techniques to implement. (See, e.g., ’853, 14:30-15:47
`
`(describing various well-known “recognition models” for image detection).) For
`
`example, Figure 10 from the ’853 patent, shown below, “illustrates exemplary
`
`system architecture with an exemplary process flow.”
`
`
`(’853, Fig. 10, 3:50-51.) Referring to Figure 10, the ’853 patent states that “[s]ource
`
`
`
`media can be provided to a coordinator.” (’853, 16:25-28.) The specification
`
`-7-
`
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`describes the coordinator functionally. The coordinator, among other things, “can
`
`be responsible for identifying new job requests and managing the routing of request
`
`input and result data output in the system.” (’853, 16:31-33.) “[T]he coordinator
`
`can send uploaded source media to a splitter and a demuxer/demultiplexer.” (’853,
`
`16:60-63.) Like the coordinator, the splitter and demuxer/demultiplexer are
`
`described functionally. Then, data from the source media, such as faces, text, and
`
`dialog, can be extracted. (’853, 16:63-67.) “[T]he extracted data can be sent through
`
`a filter to compare fractal located items…for recognition.” (’853, 17:4-6.) The ’853
`
`patent does not purport to have invented fractals; rather, the use of fractals for image
`
`recognition is one of the known recognition techniques identified in the ’853
`
`specification. (’853, 14:30-15:47.) Based on the results of the fractal filtering, the
`
`fractal can be refined and the fractal training set updated. (’853, 17:6-10.)
`
`
`
`The ’853 patent further states that, “[o]ften images are segmented only to
`
`locate and positively identify one or very few main images in the foreground of a
`
`given frame,” and “[t]he non-primary or background images are often treated as
`
`noise.” (’853, 9:8-12.) “Nevertheless,” according to the ’853 patent, “these can
`
`provide useful information, context, and/or branding for example,” and “it may
`
`become necessary or desirable to detect more detail from a frame or set of frames.”
`
`(’853, 9:12-23.) “In such circumstances, the computational thresholds for
`
`identification of an object, face, etc. can be altered according to a then stated need
`
`
`
`
`
`-8-
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`or desire for non-primary, background, obstructed and/or grainy type images. Such
`
`image identification threshold adjustment capability can be implemented, for
`
`example, as a user-controlled interface, dial, slider, or button, which enables the user
`
`to make adjustments to suit specific needs or preferences.” (’853, 9:23-30.)
`
`V. CLAIM CONSTRUCTION
`Petitioner does not believe express claim construction is necessary at this
`
`time. The prior art cited herein, as demonstrated below, renders the challenged
`
`claims obvious under any reasonable construction. Petitioner therefore respectfully
`
`submits that, for purposes of this IPR, express constructions are not required.
`
`VI. THE CHALLENGED CLAIMS ARE UNPATENTABLE
`A. Overview of Grounds
`The Petition primarily relies on Zhao (EX1003), a strong reference that alone
`
`discloses or suggests the primary features of the ’853 patent. Like the ’853 patent,
`
`Zhao discloses a system that performs detection and recognition of objects (such as
`
`people and faces) in video images to generate data, including by creating and
`
`updating object models used for detection and recognition, and by generating data
`
`about the video that can be used for indexing and retrieval. (Houh, ¶52 (citing Zhao,
`
`¶¶0051, 0053, 0055).)
`
`Ground 1 also cites Kritt (EX1004), Steinberg (EX1005), and Kouzani
`
`(EX1006) to provide express confirmation of well-known features either already
`
`
`
`
`
`-9-
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`suggested in Zhao or that would have been apparent and obvious. Kritt is cited
`
`primarily for its teachings related to the well-known concept of distributed/multi
`
`processing in connection with the analysis of claim 1[a] and the recited
`
`“coordinator,” “splitter,” and “demultiplexer nodes.”
`
` (Houh, ¶¶55-60.)
`
`Steinberg is cited primarily in connection with claim 1[b] and its recitation of an
`
`image detector that is “adjustable to increase detection of non-primary images in
`
`the video.” (Houh, ¶¶61-64.) And Kouzani, which confirms that “fractal”-based
`
`image recognition was well-known to a person of ordinary skill, is cited primarily
`
`for claim 1[c] and its recitation of comparing an image of the object to a “fractal”
`
`and updating “the fractal” with the image. (Houh, ¶¶65-70.)
`
`Ground 2 further cites Yang (EX1007) and Romdhani (EX1008) in
`
`connection with dependent claims 5-7 and their limitations related to “skin
`
`textures” and a “three-dimensional model.” (Houh, ¶¶71-81.)
`
`Ground 3 cites Trivedi (EX1009) in connection with dependent claims 8 and
`
`9, and Ground 4 cites Singer (EX1010) in connection with dependent claims 10 and
`
`11. (Houh, ¶¶82-92.) As noted in Part III.C above, Petitioner cites them for the
`
`same claim limitations for which the Examiner cited them during prosecution, and
`
`which was not distinguished or disputed by the applicant.
`
`
`
`
`
`-10-
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`Prior Art Status of Relied-Upon References
`B.
`Because the ’853 patent claims priority to an earliest application filed June
`
`29, 2016, AIA law applies to the challenged claims. All of the references cited in
`
`the Grounds qualify as prior art under §102(a)(1) because they were published before
`
`June 29, 2016. (EX1018 (Hsieh-Yee), ¶¶26-53 (Kouzani), 54-75 (Yang), 76-93
`
`(Romdhani).)
`
`C. Ground 1: Claims 1-4 Are Obvious Over Zhao in View of Kritt,
`Steinberg, and Kouzani
`Independent Claim 1: “A system for generating data from a
`
`video, comprising:” (Claim 1[pre])
`Assuming the preamble provides a claim limitation, Zhao discloses it. Zhao
`
`discloses a system that performs detection and recognition of objects, such as people
`
`and faces, in video images to generate data, including by creating and updating
`
`object models used for detection and recognition, and by generating data about the
`
`video that can be used for indexing and retrieval. (Houh, ¶¶93-94.) Figure 2 of
`
`Zhao, discussed in more detail in the analysis below, illustrates at a high level this
`
`overall process of generating data from a video performed by the system:
`
`
`
`
`
`-11-
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`
`
`
`(Zhao, Fig. 2; see also id., ¶0036 (“FIG. 2 illustrates a flowchart describing the steps
`
`involved, from a high-level, in one embodiment of the present system for detecting,
`
`modeling, recognizing, and tracking object images throughout one or more
`
`videos.”).)
`
`Zhao explains, for example, that it “generally relate[s] to systems and methods
`
`for detection, modeling, recognition, and tracking of objects within video content”
`
`and to “indexing and retrieval systems for videos based on generated object models.”
`
`(Zhao, ¶0051.) “These objects include people, faces, articles of clothing, plants,
`
`
`
`
`
`-12-
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`animals, machinery, electronic equipment, food, and virtually any other type of
`
`image that can be captured or presented in video.” (Zhao, ¶0051.) Zhao’s system
`
`“may be operated in a computer environment” and “any results or outputs relating
`
`to detection, modeling, recognition, indexing, and/or tracking of object images
`
`within videos may be stored in a database” or otherwise output. (Zhao, ¶0055.)
`
`Zhao states that its system is “useful for a wide variety of applications, including
`
`video indexing and retrieval, video surveillance and security, unknown person
`
`identification, advertising, and many other fields.” (Zhao, ¶0053.)
`
`
`
`Therefore, Zhao discloses “a system for generating data from a video.”
`
`And as explained below, the system as claimed is obvious over Zhao, Kritt,
`
`Steinberg, and Kouzani.
`
`(a)
`
`“a coordinator communicatively coupled to a splitter
`and to a plurality of demultiplexer nodes, wherein the
`splitter is configured to segment the video, wherein the
`demultiplexer nodes are configured to extract audio files
`from the video and to extract still frame images from the
`video;” (Claim 1[a])
`As explained below, claim 1[a] is disclosed by and obvious over Zhao in view
`
`of Kritt. (Houh, ¶¶96-113.)
`
`For context, the ’853 patent describes a “coordinator,” “splitter,” and
`
`“demultiplexer” in terms of functions they may perform, and does not state any
`
`particular form or structure. (Houh, ¶97.) The ’853 patent states, for instance, that
`
`“[s]ource media can be provided to a coordinator, which can be a non-visual
`
`-13-
`
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`component that can perform one or more of several functions.” (’853, 16:26-27.)
`
`For example, “[t]he coordinator can direct distributed processing and aggregation.”
`
`(’853, 18:16-17.) Referring to figure 10, reproduced below, the ’853 patent states
`
`that “the coordinator can send uploaded source media to a splitter and a
`
`demuxer/demultiplexer”:
`
`
`(’853, Fig. 10, 16:60-63; see also id., 3:50-51.) A “splitter” can be invoked to
`
`
`
`“‘slice’ [media] assets into media segments and/or multiple sub assets comprising
`
`the entire stream.” (’853, 17:32-37.) A “demultiplexer” can “extract audio files
`
`from the video and/or to extract still frame images from the video.” (’853, 1:45-47.)
`
`
`
`
`
`-14-
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`For example, “a demultiplexer component can strip audio data from segments and
`
`store audio streams for future analysis.” (’853, 17:61-62.)
`
`A person of ordinary skill would recognize that the arrangement of a
`
`“coordinator,” “splitter,” and “plurality of demultiplexer nodes” recited in claim
`
`1[a] is merely an application of the well-known concept of distributed processing
`
`(or multiprocessing) to video processing, to obtain the obvious benefits that
`
`processing time can be reduced, and more processing overall can be performed, by
`
`sharing the workload and performing multiple operations in a parallel fashion.
`
`(Houh, ¶98.) The ’853 patent likewise explains that “the splitter can be configured
`
`to segment the video,” then later states, for example, that “[t]he video-to-data engine
`
`can
`
`segment
`
`the video
`
`into
`
`chunks
`
`for distributed, or parallel,
`
`processing….Distributed processing in this context can mean that the processing
`
`time for analyzing a video from beginning to end is a fraction of the play time of the
`
`video. This can be accomplished by breaking the processes into sections and
`
`processing
`
`them simultaneously.”
`
` (’853, 2:35-36, 6:15-21.)
`
` By 2016,
`
`distributed/multi processing was well-known and neither novel nor non-obvious.
`
`(Houh, ¶98 (citing EX1012, pp.005 (entry for “distributed processing”), 006 (entry
`
`for “multiprocessing”)).) And as explained below, claim 1[a] is obvious over Zhao
`
`in view of Kritt.
`
`
`
`
`
`-15-
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`Turning to Zhao, Petitioner will first explain how Zhao teaches both “splitter”
`
`operations and “demultiplexer” operations. Petitioner will then explain how Zhao
`
`discloses and renders obvious a “coordinator” before turning to the combination
`
`with Kritt.
`
`Zhao itself discloses a “splitter” “configured to segment the video” in the
`
`form of processes that perform “shot/scene boundary detection” to split a video into
`
`shots or scenes that are extracted for processing. (Houh, ¶100.) Zhao explains:
`
`Once the video has been retrieved, the video undergoes a process of
`detection of objects and extraction of object features from video images
`(processes 210, 220). Concurrently, shots or scenes are detected in the
`video via a shot/scene boundary detection procedure (process 215). The
`shots or scenes of the video are extracted to provide a sequence of
`images for analysis by the system, and each detected shot or scene is
`individually analyzed. Generally, the term “shot” or “scene” refers to a
`grouping of frames or images recorded by either a stationary or
`smoothly-moving camera, with little or no background change between
`the frames, corresponding to one continuous time period.
`
`(Zhao, ¶0057.) Figure 2 of Zhao illustrates this process with shot boundary detection
`
`process 215:
`
`
`
`
`
`-16-
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`
`
`
`(Zhao, Fig. 2 (highlighted added).)
`
`With respect to the “demultiplexer,” Zhao also discloses at least some of the
`
`claimed “demultiplexer” functionality—i.e., “extract[ing] still frame images
`
`from the video.” (Houh, ¶101.) Zhao describes “receiving a video file, wherein
`
`the video file comprises a plurality of video frames.” (Zhao, ¶0023.) Zhao teaches
`
`that the object detection and recognition processes involve extracting still frame
`
`images from the video. (Houh, ¶101.) For example, Zhao explains that the object
`
`
`
`
`
`-17-
`
`
`
`
`
`Petition for Inter Partes Review of
`U.S. Patent No. 11,126,853 B2
`
`recognition process is performed on an “image or frame extracted from a video.”2
`
`(Zhao, ¶0075 (“The process 400 shown in FIG. 4 is for one image or frame extracted
`
`from a video, but as will be appreciated, this process is typically repeated for each
`
`optimal object image in the video.”).) Zhao similarly explains that the object
`
`detection process involves “detection of objects and extraction of object features
`
`from video images (processes 210, 220).” (Zhao, ¶0057; see also id., ¶0059 (“Once
`
`an object has been detected within a video frame…”).)
`
`With respect to “demultiplexer” functionality to “extract audio files from
`
`the video,” Petitioner notes that the claims of the ’853 patent do not recite any
`
`additional limitations related to “audio” or the extracted “audio files.” (Houh, ¶102.)
`
`Zhao likewise does not discuss audio information, which is not surprising because
`
`audio processing is not a focus of Zhao’s purported invention. (Id.) But it would
`
`have been obvious for Zhao’s system to further include processes for “extracting
`
`audio files from the video.” (Id.) For example, Zhao discloses performing its
`
`techniques on videos such as episodes of television shows such as the Gilmore Girls
`
`(Zhao, e.g., ¶¶0040-0041, 0093, Fig. 6), which a person of ordinary skill would have
`
`understood and found obvious to have included one or more audio tracks. (Houh,
`
`¶102.) Zhao more generally teaches that its tec



